-
Notifications
You must be signed in to change notification settings - Fork 20
ESGF_Data_Node_Only
Wiki Reorganisation |
---|
This page has been classified for reorganisation. It has been given the category MOVE. |
The content of this page will be revised and moved to one or more other pages in the new wiki structure. |
This guide is intended for ESGF data node managers, particularly CMIP data managers, who are upgrading to the peer-to-peer (P2P) system. The instructions account for installation from scratch as well as upgrading an existing installation. The goal is to build a system that can operate within both the new P2P environment and the older 'gateway-centric' system. The guide describes building a 'data-only' system, i.e. with the capability to _ securely _ serve data through a Thredds Data Server, and to advertise its presence and services to the rest of the federation.
After the upgrade procedure is complete, the Data Node will be able to serve and authorize users of both the old gateway-centric system, and the new P2P system. Additionally, the Data Node performance should be much better because it will be able to cache the user attributes, without the need to query them over and over again. Note also that the gateway-centric system will soon be shut down, so it is important to make sure that (a) your data is accessible through the P2P system; and (b) you are able to publish data directly to a P2P Index Node .
If upgrading an existing data node installation, there are several considerations to be aware of:
- The upgraded node can reuse the existing Postgres database and THREDDS catalogs without modification. This means that existing datasets do not need to be rescanned and republished to be part of the P2P system.
- The installation script will note the presence of existing Postgres, THREDDS, and Globus components, and will not alter them by default.
- On the downside, an older installation may have some settings that interfere with the upgrade process. These cases are noted throughout the guide. For example, some environment variables may interact with the installation. The installation script has been tested and found to work correctly in several different upgrade situations.
On the other hand, building a new node from scratch has its own advantages and disadvantages:
- In general, installing on a 'clean' system is the most reliable way to create a new installation.
- However, transferring existing postgres and THREDDS data from an older installation is somewhat harder. In general, the simplest approach is to rescan and republish all data on the new node.
A "Data Node Only" installation includes the following ESGF components (see Figure):
- The ESGF Node Manager (NM)
- The Thredds Data Server (TDS), configured with the ESGF access control filters, serving the data through the Thredds XML catalogs
- The ESGF Openid Relying Party (ORP), which includes an embedded Authorization Service
- The ESGF publishing application, backed up by a local Postgres database storing metadata about the datasets
A Data Node delegates all "non-data" functionality to other nodes in the federation, which are separately maintained and configured, specifically:
- IdP Nodes . Users register, authenticate and gain access control attributes through external IdP nodes.
* The IdP node with which a user authenticates to is determined by the user's openid.
* The Registration and Attribute Services that are responsible for assigning and serving the user attributes are determined by the configuration files in the Data Node local file system (some dynamically generated by the Node Manager, some maintained by the local administrators - see later for details)
- Index Nodes . Users find the data served by the Data Node by starting the discovery process at some external Index Node where the data is published.
* The target Index Node URL is specified in the configuration file esg.ini which is used by the ESGF publisher application. (see below w.r.t. _ re-pointing _ your node)
* Potentially, different datasets from the same Data node can be published to different Index Nodes (if, for example, the same Data Node would host both model and observational data, and there was a desire to prominently show these datasets on different Index Nodes).
Note that for scaling capabilities, one ESGF site can setup multiple Data Nodes, all attached to a single full P2P Node that provides the IdP and Index services.
Figure: Architecture of a Data Node interacting with external IdP and Index Nodes.
- Check all system prerequisites
- Obtain the ESGF installer script (esg-node)
-
Get the bootstrap script.
-
cd /usr/local/bin; esg-bootstrap
-
- For upgrades, gather information about postgres users, specifically:
- username and password of the postgres administrative user
- same for the low-privilege user that accesses the esgcet database
- Make sure the /esg directory exists. Some older installation do not have this directory. If necessary:
- Copy esg.ini (publisher configuration file) to /esg/config/esgcet/esg.ini
- Older installations may need to change the dburl entry in esg.ini to have a database type 'postgresql'.
- The new default location of the THREDDS catalogs is /esg/content/thredds. Some older installations may store THREDDS catalogs in /usr/local/tomcat/content/thredds. It may smooth the upgrade to recursively copy existing catalogs to the new default location.
-
ESGF Administrator Guide
-
ESGF Transition Guide
For safety, backup all critical directories first, specifically:
- /esg
- /usr/local/tomcat (Note: only the webapps subdirectory will be modified when upgrading)
- /usr/local/globus (or $GLOBUS_LOCATION if different)
- Postgres database
- /etc/grid-security
Remove all ESG-related environment variables. They can interfere with the installation script. For example, unset:
- MYPROXY_SERVER_DN
- X509_CERT_DIR
- ESG_GATEWAY_SVC_ROOT
Then, for the time being completely wipe out your current Tomcat installation:
- rm -rf /usr/local/tomcat/webapps/*
Run as root:
% esg-node --type data --install [--no-globus]
Use the --no-globus flag if you are upgrading, and do not want to upgrade Globus as well. This will significantly speed up installation.
The installer prompts for a lot of choices, so it's easy to make an error. If so don't worry, just restart the installation script. Your previous choices are saved, and will be the default answers subsequently. Here are some things to watch for:
- I put in the wrong admin password. Now the installer cannot connect to postgres.
The 'admin' password refers to the Postgres administrative user. If you enter an invalid password, the test connection to the database will fail. Reset the password in /esg/config/.esg_pg_pass and rerun the installation.
What does the 'tomcat password' refer to?
The tomcat username and password are used by the publisher to reinitialize the THREDDS server. For an upgrade, they should match thredds_username and thredds_password in esg.ini
What peer group should I choose?
CMIP5 production data nodes should be in the 'esgf-prod' peer group. Test nodes are in 'esgf-test'.
What IDP node should I choose?
For production nodes, pcmdi9.llnl.gov. For test nodes, esgf-node1.llnl.gov
What index peer should I choose?
If you currently publish to PCMDI, your IDP node should be pcmdi9.llnl.gov. Other data nodes should point to P2P IDP nodes at their current publication site.
What is the default peer?
The default peer can be any node in the peer group.
There are multiple certificates presented in a chain. Which one should I choose?
The default certificate is generally the correct choice.
The installer cannot find esg.ini. I am upgrading an existing installation.
esg.ini is the publisher configuration file. The current default location is /esg/config/esgcet/esg.ini. Older installations may not have this directory. If not, just create the directory and copy esg.ini into it from it's current location. Also if you have ESGINI set, either unset it or set it to /esg/config/esgcet/esg.ini
Peer settings can be modified with esg-node:
- Default peer
% esg-node --set-default-peer
Can be any host in the peer group
Peer group
% esg-node --set-peer-group <group,group,...>
Can specify multiple groups
Index peer
% esg-node --set-index-peer
Where you will publish to, also modifies hessian_service_url in esg.ini
Identify provider
% esg-node --set-idp-peer
Choose from the menu provided
At the end of the installation, you want to make sure you install the latest trusted ESGF certificates:
% esg-node --force --rebuild-truststore
-
It is important to make sure that the Data Node belongs to the same peer group as the other IdP and Index nodes with which it has to interact - otherwise users won't be able to login or register for data access, because the URLs won't be found in the list of trusted IdPs and Attribute Services. At the same time, you must also select a default peer that is in the same group to exchange the information with. The relevant properties are node.peer.group= and esgf.default.peer= in the file /esg/config/esgf.properties (see also notes on how to change these values through the node script).
-
If you are configuring a Data Node that is part of the CMIP5 production environment, you should set these properties to node.peer.group=esgf-prod and esgf.default.peer=pcmdi9.llnl.gov .
-
If you are configuring a Data Node for testing purposes before joining the production ESGF group, you should set these properties to node.peer.group=esgf-test and esgf.default.peer=esgf-node1.llnl.gov .
-
-
Make sure that you have a file /esg/config/esgf_ats_static.xml that contains the default URLs of the Attribute and Registration Services needed for CMIP5 data access, with the following content:
<ats_whitelist xmlns="http://www.esgf.org/whitelist">
<!-- pcmdi7 Attribute Service: garantees that users that register with a gateway are immediately recognized by the p2p system --> <attribute type="CMIP5 Research" attributeService="https://pcmdi7.llnl.gov/esgf-security/saml/soap/secure/attributeService.htm" description="Users of CMIP5 data for non-commercial research purposes only" /> <attribute type="CMIP5 Commercial" attributeService="https://pcmdi7.llnl.gov/esgf-security/saml/soap/secure/attributeService.htm" description="Users of CMIP5 data for commercial purposes" />
</ats_whitelist>
-
Note that, if your node is part of the "esgf-prod" group, some of these URLs may be duplicated in the file /esg/config/esgf_ats.xml which is dynamically created by the node manager - this is not a problem since duplicate URLs will be collapsed into single entries.
-
Also make sure that the THREDDS Authorization Filter is configured to query the Authorization Service embedded in the local ORP. Specifically, the file webapps/thredds/WEB-INF/web.xml should have the following entry (with the obvious substitution):
<init-param> <param-name>authorizationServiceUrl</param-name> <param-value> https://<your data node host name>/esg-orp/saml/soap/secure/authorizationService.htm </param-value> </init-param>
-
The local Authorization Service is able to query all necessary remote Attribute Services, as found in the "esgf_ats*.xml" files mentioned above, and to cache the corresponding attribute statements.
-
CMIP5 Data Nodes, please note : you must configure the access control policies for your local data in the file /esg/config/esgf_policies_local.xml :
* To allow CMIP5 users to download data (i.e. "Read"), you insert policy statements that match your data URLs (through regular expressions) to the group "CMIP5 Research", and possibly "CMIP5 Commercial", for the standard role of "user" (and the legacy role of "default", which will disappear in the future). Read access to CMIP5 data may not be enabled by default!
- To allow the local data managers to publish data to a P2P Index Node (i.e. "Write"), the Index Node must be configured with a policy statement that matches the identifiers of the datasets to be published (for a chosen group, and role of "publisher"):
-
If data is published to the local P2P Node (i.e. the Data Node is also an Index Node), the publishing policy statement must be entered in the local _ esgf_policies_local.xml _ file.
-
If data is published to a remote P2P Index Node (for example, the pcmdi9 CMIP5 Index Node), such a statement has to be entered by the administrators of the remote node in the file _ esgf_policies_local.xml _ on the remote server.
-
An example _ esgf_policies_local.xml _ file follows, customized for CMIP5 data hosted on the NASA/JPL Data Node:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<policies xmlns="http://www.esgf.org/security">
<!-- CMIP5 Research access control -->
<policy resource=".*cmip5.*" attribute_type="CMIP5 Research" attribute_value="user" action="Read"/>
<policy resource=".*cmip5.*" attribute_type="CMIP5 Commercial" attribute_value="user" action="Read"/>
<policy resource=".*cmip5.*" attribute_type="CMIP5 Research" attribute_value="default" action="Read"/>
<policy resource=".*cmip5.*" attribute_type="CMIP5 Commercial" attribute_value="default" action="Read"/>
<!-- NASA-JPL access control -->
<policy resource=".*NASA-JPL.*" attribute_type="NASA OBS" attribute_value="publisher" action="Write"/>
</policies>
- If you need to do anything more complicated, please consult the ESGF Access Control Guide .
% esg-node restart
Note on Java memory requirements:
The default Java memory settings in esg-node are:
-Xmx2048m -Xms1024m -XX:MaxPermSize=512m
This is too small for most production environments. To increase memory, use the JAVA_OPTS environment variable. This can be set in /etc/esg.env, for example:
export JAVA_OPTS="-Xmx15G -Xms10G -XX:MaxPermSize=512m"
In a browser, check the following locations
Node manager. You should see a nifty diagram.
http://HOSTNAME/esgf-node-manager/registration.xml
XML Registry, showing the peer nodes and service endpoints.
http://HOSTNAME/thredds/catalog.html
THREDDS catalog.
https://HOSTNAME/esg-orp/home.htm
OpenID Relying Service. You should see a Data Access Login page.
If any of these services are not working correctly, the installation is not complete! Diagnose the problem then rerun the installation. To help fix the problem:
- Check the tomcat server logs in /usr/local/tomcat/logs/catalina.out and catalina.error for clues to the problem.
- Check the installation components in /esg/esgf-install-manifest. If the node manager, ORP, or any other web application appears to be out of date, remove the corresponding webapp from /usr/local/tomcat and reinstall.
Some possible errors:
Problem: The ORP would not start because of a missing / invalid keystore alias.
Solution: Make sure that the tomcat server.xml SSL connector has the attribute
keyAlias="MY_KEYSTORE_ALIAS"
Use the command
% keytool -list -v -keystore /usr/local/tomcat/conf/keystore-tomcat
to determine the value of the keystore alias. Restart tomcat to make the keystore alias value effective.
Problem: The IDP I specified is not whitelisted.
Solution: Add the IDP to /esg/config/esgf_idp_static.xml. For example, to allow use of pcmdi9, add
<value>https://pcmdi9.llnl.gov/esgf-idp/idp/openidServer.htm</value>
Run the installation verification step:
% esg-node --verify
This will check port connections (ignore port 8009 errors) and run a test publication. If the test publication fails, check that hessian_service_url is set correctly in the publisher esg.ini configuration. Note that peer-to-peer publication endpoints are different from the old (gateway-centric) system:
peer-to-peer: https://INDEXHOST/esg-search/remote/secure/client-cert/hessian/publishingService
gateway: https://INDEXHOST/esgcet/remote/secure/client-cert/hessian/publishingService
Test a file download through the Web front-end. If access to the file is restricted, modify the access policies in:
/esg/config/esgf_policies_local.xml
to allow the download.
To test that the url in question triggers the correct policy. Run the esg-node script using the --policy-check flag.
-------------------------------
Security Policy...
-------------------------------
--policy-check <resource string> - returns the list of policies that are triggered by the provided resource
If no policies are triggered the resource will certainly NOT be made available.
Reset the publication target.
After publication to a P2P index node has been tested, it may be necessary to publish to a gateway-centric node. Production data should not be published to P2P nodes until the transition process is completed. To toggle between P2P and gateway-centric publication:
Set the publication target to a gateway:
% esg-node --set-publication-target pcmdi3.llnl.gov/esgcet
Set the publication target to a P2P node:
% esg-node --set-index-peer pcmdi9.llnl.gov
CMIP5 Data Nodes : Please notify the pcmdi3 administrator when you have updated your data node, so that the gateway settings can be modified to support downloads. This is particularly important if the previous data node installation relied on token-based authentication.
The esg-node installation script is also the boot script
To stop/start or restart or check the status of the node...
%> esg-node stop
%> esg-node start
%> esg-node status
%> esg-node restart
These and all the flags supported by the esg-node script can be gotten by using the --help option
%> esg-node --help
usage:
(as root)
esg-node ([--<directive>] | [start] | [stop] | [status] | [restart]
...
It may be placed in /etc/init.d and installed in the host's boot sequence using _ chkconfig _ . This means it follows the _ standard options _ for all service scripts, namely start , stop , restart , status (as just described above). To add to the host's boot sequence do the following...
%> cp /usr/local/bin/esg-node /etc/init.d
%> cd /etc/init.d
%> chkconfig --add esg-node
-
During installation you were presented with a menu to select your IDP or specify a custom IDP. After installation you may _ re-point _ your node to another idp at any time with the command: esg-node --set-idp-peer [return]. Again keep in mind the previous item.
-
During installation you specify the target index node. The target index node can be _ re-pointed _ to another index node after installation by using the command: esg-node --set-index-peer .
-
There are a few flags in the esg-node script that allow you to manipulate information regarding your nodes' federation settings.
%> esg-node --help ... ------------------------------- Federation / Node Relationship Flags: ------------------------------- --set-default-peer --get-default-peer - tells you this node's currently configured default peer --set-peer-group <name of group(s), comma delim, you wish to participate> --get-peer-group - tells you this node's currenly configured peer group(s) --federation-sanity-check - tells you if your node's configured peer groups intersect with default peer --spotcheck - provides basic federation mesh (inter-connectivity) information.
...
The important thing for peers to to know at least one other peer, the default peer, to allow information exchange. This default peer should be a peer that is also connected to other peers, and so on and so on and so on. The default peer must also share at least one peer group with your node as a requirement for information exchange. Above we discussed how to set the index peer and the idp peer. Now we want to discuss how to get and set the default peer information and your peer group as well as getting a sense of the _ sanity _ of your P2P relationships. So the following will illustrate the flags above.
(For these examples the commands were issued on esgf-node1.llnl.gov)
%> esg-node --get-default-peer
Current "default" Peer: [esgf-node3.llnl.gov]
%> esg-node --set-default-peer esgf-node3.llnl.gov
This node shares the group(s) [esgf-gavin-test2, esgf-test] with esgf-node3.llnl.gov [OK]
Default Peer set to: [esgf-node3.llnl.gov]
(restart node to enable default peer value)
%> esg-node --get-peer-group
Configured to participate in peer group(s): [esgf-gavin-test2,esgf-test]
%> esg-node --set-peer-group esgf-gavin-test2,esgf-test
This node shares the group(s) [esgf-gavin-test2, esgf-test] with esgf-node3.llnl.gov [OK]
Peer Group is set to: [esgf-gavin-test2,esgf-test]
(restart node to enable group value)
(These examples are a bit contrived as I am essentially re-setting the values, but the illustration is sound)
These commands edit the esgf.properties file discussed earlier.
For checking that your are connected to a peer that shares at lest one peer group with the node you are on use the --federation-sanity-check flag. This flag also indicates the type of node that you are pointing to. By default it will look for the default, index and idp peers and show the group membership intersection.
%> esg-node --federation-sanity-check
This node shares the group(s) [esgf-gavin-test2, esgf-test] with esgf-node3.llnl.gov (default ) [OK]
This node shares the group(s) [esgf-gavin-test2, esgf-test] with esgf-node1.llnl.gov (idp index) [OK]
To see what federation looks like from your node's vantage point (or any other node for that matter) you can issue the --spotcheck flag (below).
%> esg-node --spotcheck
Spot Checking [localhost]...
[1] looking at site http://adm08.cmcc.it/esgf-node-manager/registration.xml -> 4
[2] looking at site http://esgf-node1.llnl.gov/esgf-node-manager/registration.xml -> 4
[3] looking at site http://esgf-node2.llnl.gov/esgf-node-manager/registration.xml -> 4
[4] looking at site http://esgf-node3.llnl.gov/esgf-node-manager/registration.xml -> 4
Here we see a small fully meshed federation of 4 nodes (4x4) SweeeT!
It is very important to keep certificates updated. Each node admin is responsible for this task. At the time of this writing, it means periodically running %> esg-node --force --rebuild-truststore followed by a restart. We will improve upon this process by making things more automatic but at this moment almost all the connectivity issues between nodes especially downloading from data nodes can be attributed to improper upkeep of federation certificates.
- "Authorization Denied" when trying to download data . This problem is most likely caused by an improper access control configuration. The best strategy to solve the problem is to execute a data download request from the web, and at the same time monitor the log statements in the following files:
* To watch the Tomcat log file: tail -f /usr/local/tomcat/logs/catalina.out
* To watch the TDS log file: tail -f /esg/content/thredds/logs/threddsServlet.log
Then double check your access control configuration as described above, and if the problem persists email ESGF Support with the relevant parts of the log files.
There are four node _ types _ : "DATA", "INDEX", "IDP", * "COMPUTE". These node types may be co-located on a single host in any combination*. Typically when setting up a distributed ESGF setup the cardinality of the set, of each of these types, is not equivalent i.e. there is not a 1:1:1:1 relationship enforced among these types. You would usually have a single IDP node, with one or two index nodes depending on how you wish to partition the index of your publications and then have data nodes wherever you have your data (caveat: DATA nodes must be reachable for downloading! ). The described scenario gives a topology of DATA:n INDEX:2 IDP:1. There is almost always only a single IDP to maintain user info. For practical purposes there is usually a single INDEX node as well. The DATA nodes should be distributed bounded by data partition lines (determined by your organization). DATA nodes should be setup judiciously, not everywhere you have data, especially given the caveat. It is up to your organization to decide which distribution pattern fits. The ESGF architecture is designed to scale out.
( * COMPUTE type must be co-located with DATA node types. This is because the computation is expected to be done closest to the data being computed on)