-
Notifications
You must be signed in to change notification settings - Fork 3
Basic File Management in COAL SDS
This wiki entry provides a guide for running coal-sds as an end-to-end science data system. Note, to execute various commands, you may need to be root.
Please ensure you have all of the prerequisite software installed. In particular, you should now have an unpacked coal-sds deployment available on your filesystem at /usr/local/coal-sds-deploy
. The following documentation assumes you have executed a cd
into that directory. The unpackaged coal-sds contents will look as follows
[ec2-user@ip-172-31-28-45 coal-sds-deploy]$ ls -al
total 56
drwxr-xr-x 14 root root 4096 Apr 16 02:12 .
drwxr-xr-x 14 root root 4096 Apr 16 02:12 ..
drwxrwxrwx 2 root root 4096 Apr 16 01:51 bin
drwxr-xr-x 7 root root 4096 Apr 16 02:12 crawler
drwxrwxrwx 7 root root 4096 Apr 16 02:07 data
drwxr-xr-x 4 root root 4096 Apr 16 02:12 extensions
drwxr-xr-x 8 root root 4096 Apr 16 02:12 filemgr
drwxrwxrwx 2 root root 4096 Apr 16 02:07 logs
drwxr-xr-x 8 root root 4096 Apr 16 02:12 pcs
drwxr-xr-x 5 root root 4096 Apr 16 02:12 pge
drwxr-xr-x 8 root root 4096 Apr 16 02:12 resmgr
drwxr-xr-x 3 root root 4096 Apr 16 02:12 solr
drwxrwxrwx 11 root root 4096 Apr 16 02:06 tomcat
drwxr-xr-x 8 root root 4096 Apr 16 02:12 workflow
Generally speaking, information on OODT File Management can be found here. For the purpose of making this specific to COAL-SDS however, below provides a blueprint for basic file management within the SDS infrastructure.
Important files associated with the file manager include
- filemgr.properties - basic properties for the file management service,
- product-types.xml - XML descriptor for the type of files and metadata extraction functionality you wish to associate with the file management service.
The above files are already configured ready for deployment, so once you have deployed the COAL from source (as described above) you can start the filemgr service so that it (re-)reads the filemgr.properties and product-types.xml.
In additional to File Management, the COAL-SDS provides a number of other services. WIthout getting into what they are right now, we will start the entire SDS right now and only use the File Management portion for the time being. Execute the following commands
$ cd /usr/local/coal-sds-deploy/bin
$ ./oodt start
As explained in the OODT documentation (see NOTE 2) after you launch oodt, you may observe the following output:
Using CLASSPATH:
-e Starting OODT File Manager [ Failed ]
-e Starting OODT Resource Manager [ Failed ]
-e Starting OODT Workflow Manager [ Failed ]
Don't be confused. In order to see whether the oodt is running, open a browser to http://localhost:8080/opsui. Click on PCS Status link to get detailed information about running processes. A green arrow indicates that the corresponding process runs correctly.
You may also need to kill whichever process is running on port 9000 before this command will execute successfully otherwise you will encounter a java.net.BindException: Address already in use (Bind failed)
What have we done above?
Well... we've defined and configured
- A place to store the COAL catalog, i.e. the database of metadata.
- A place to store ingested files, i.e. the repository.
- The location of your policy directory for pycoal product specifications.
- Your mime-types configuration file for pycoal product recognition.
If you really want, now is a good time to read about exactly how coal-sds metadata is collected. Otherwise, continue as follows and we will ingest some files and then query the catalog.
If you have read the OODT File Management User Guide you will already know that there are a number of ways to ingest products intp the Catalog. Using COAL-SDS, follow the directions below to manually ingest your first COAL product.
If you have run the pycoal tests or examples you must have some COAL products lying around. Transfer them into a staging directory such that we can locate them and ingest them into the Catalog. Ensure that you copy them into the correct data/staging directory in order to keep the data workflow consistent, tidy and structured.
$ cp /path/to/ang20150420t182050_corr_v1e_img.hdr /usr/local/coal-sds-deploy/data/staging/
Now create a metadata file (notice the .met suffix below) to accompany the product we have staged above. Ensure that the metadata file resides within the same staging directory.
touch /usr/local/coal-sds-deploy/data/staging/ang20150420t182050_corr_v1e_img.hdr.met
Now add the below VERY basic metadata to the metadata file
<cas:metadata xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas">
</cas:metadata>
We are now ready to progress to ingesting some data!
Assuming that the File Manager service is up and running, from within execute the following commands
$ cd /usr/local/coal-sds-deploy/filemgr/bin
$ ./filemgr-client --operation --ingestProduct --productName ang20150420t182050_corr_v1e_img.hdr --productStructure Flat --productTypeName GenericFile --metadataFile file:///usr/local/coal-sds-deploy/data/staging/ang20150420t182050_corr_v1e_img.hdr.met --refs file:///usr/local/coal-sds-deploy/data/staging/ang20150420t182050_corr_v1e_img.hdr --url http://localhost:9000
For details on what the above means, you should read the OODT File Management User Guide.
If all goes well you should see some logging in the terminal and the following output
...
INFO: Successful ingest of product: [/usr/local/coal-sds-deploy/data/staging/ang20150420t182050_corr_v1e_img.hdr]
You can now also look at this within the File Management portion of the OPSUI web application at http://localhost:8080/opsui Now let's query the catalog to see if we can obtain some information on the ingested product.
You should already be in /usr/local/coal-sds-deploy/filemgr/bin
, so just execute the following
$ ./query-tool --url http://localhost:9000 --sql -query 'SELECT * FROM GenericFile'
The output should provide information on the product you just ingested.
The above is an extremely basic, highly manual method for utilizing the File Management services offered by COAL-SDS. COAL-SDS is much more comprehensive than this. It is capable of enabling us to automate the generation and maintenance of a metadata-rich product catalog, automate staging of data products by acquiring then from remote resources such as AWS s3, FTP and other places, and even automating the generation of pycoal tasks via workflow executions. Basic File Management is literally the VERY beginning of the COAL-SDS system.