Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transformation error causing USGS records to not get harvested #91

Open
amilan17 opened this issue Jan 12, 2015 · 6 comments
Open

transformation error causing USGS records to not get harvested #91

amilan17 opened this issue Jan 12, 2015 · 6 comments

Comments

@amilan17
Copy link

Many records from USGS are resulting in a transformation error due to an error in the fgdc to iso XSL.

This is one of the CSDGM records causing this error:
http://data.usgs.gov/metadata/Mineral_Resources_On-Line_Spatial_Data/535e99ace4b08e65d60f8e2b.xml

This is the error message:

Transformation to ISO failed
The transformation service returned an error for object {0}: [409] net.sf.saxon.trans.XPathException: A sequence of more than one item is not allowed as the first argument of normalize-space() ("http://pubs.usgs.gov/of/1997/o...", "http://pubs.usgs.gov/of/1997/o...", ...)

This is where the transform is taking all URLs from all elements into ONE CI_OnlineResource/linkage/URL element:
https://github.com/GSA/ckanext-geodatagov/blob/master/conversiontool/fgdc2iso/fgdcrse2iso19115-2.xslt#L4308

Thanks in advance for looking into this. I'm available for question or further testing if needed.

@amilan17 amilan17 changed the title fgdc2iso transform introduces multiple URLS into one field transformation error causing USGS records to not get harvested Jan 12, 2015
@kvuppala
Copy link
Contributor

@amilan17
In FGDC metadata, is it allowed to have multiple URLs under the tag as provided below, looks like the ISO transformation is expecting only one value here, should the transformation convert all these multiple links into a separate resource in CKAN catalog, and apply the same name, desc for all resources as defined under tag?

<digform>
                <digtinfo>
                    <formname>Arc/Info export</formname>
                    <formvern>7.x</formvern>
                    <formcont> Gridded files for the Alaska composite (akc*) and merged (akm*) aeromagnetic data. New versions of the grids were added to the web site in February 1999. These grids are akc_msat* and akm_msat*.  The new grids contain a regional surface correction based on a satellite magnetic model of the long wavelengths of the Earth's magnetic field (see March 1999 issue of GSA Today for more information).  The original grids contained a questionable long-wavelength trend which caused the NW portion of the grids to be tipped downward (there was also a spurious trend with a different slope in SE Alaska). </formcont>
                    <filedec>gzip -d</filedec>
                    <transize>8.7</transize>
                </digtinfo>
                <digtopt>
                    <onlinopt>
                        <computer>
                            <networka>
                                <networkr>http://pubs.usgs.gov/of/1997/ofr-97-0520/data/akc_e00.gz</networkr>
                                <networkr>http://pubs.usgs.gov/of/1997/ofr-97-0520/data/akc_msat_e00.gz</networkr>
                                <networkr>http://pubs.usgs.gov/of/1997/ofr-97-0520/data/akm_e00.gz</networkr>
                                <networkr>http://pubs.usgs.gov/of/1997/ofr-97-0520/data/akm_msat_e00.gz</networkr>
                            </networka>
                        </computer>
                    </onlinopt>
                </digtopt>
            </digform>

@amilan17
Copy link
Author

@kvuppala
Yes. this xml structure is completely valid FGDC xml. I don't like identical names and descriptions for different URLs in the resulting ISO and technically and it's not really an accurate mapping, because those names and descriptions are for the format, not the URL. I think it will be more correct to re-use the URL in the name of the CI_OnlineResource and not populate the description field.

@amilan17
Copy link
Author

@kvuppala @FuhuXia
I think these errors were introduced during this commit:
ef33815

@kvuppala
Copy link
Contributor

@amilan17
Thank you, we are looking at this and see how we can accommodate both the requirements of harvesting all the links provided in the tags along with feature #86 (above commit address this issue #86)

@kvuppala
Copy link
Contributor

issue is similar to #90

@kvuppala
Copy link
Contributor

More documentation and proposed solution (option 2) is available @ https://docs.google.com/document/d/1wOHSA2RNwjsgDuqDzFTKifQceTBxxvMBmDssnRLivLA/edit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants