Skip to content

Mapping

Cole Hudson edited this page Oct 18, 2018 · 4 revisions

You've reached the field mapping part of this guide. Both parts of the documentation (for systems administrators and portal managers) introduce this concept, so you should be ready to tackle this challenge. Ideally, the systems administrator and portal manager should be reading this together right now, as both of your skills will be required.

The file referenced previously is an XML configuration file that takes XML records from an OAI feed and maps their values into a flattened representation of the data needed by Solr and the portal front-end. There are two important sections into this xml file: the attributes found within the opening entity element and the xpath attributes found within each field element.

Entity element attributes

Update the URL to reflect your OAI feed using the format http://YOUR_URL. Update the prefix attribute to match the prefix value presumably set in your OAI feed. Note your idCol attribute is automatically set to match the field column attribute for identifier. This is the unique value for each field.

Field elements

Each field element has two parts: the column attribute which corresponds to a name set within Solr (don't update unless you know what you're doing) and the xpath attribute which sets the XPath needed to traverse the record and find the value you're looking for.

Testing Configurations

It is recommended that you test all of these values before you install Solr. Make sure your OAI url goes to a specific feed that returns records. If you have the ability to login to the portal server, please use the Linux utility curl to confirm that no networking/firewall issues exist. It is presumed that your OAI records do not match the default Michigan Service Hub schema found here. So, the XPath for each field will need to be updated. Using your preferred tool or even a web-based service such as xPath tester, run each xPath query through a sample record and make sure the correct values are pulled from the record. Mapping errors will cause the Solr indexer to fail silently.

(Optional) Loading a Previous GeoLocation Table

If you have a previous instance of the portal running that has items in it, you will have a table full of geolocated places in your database. This data can be useful if you are moving data from one portal instance to another and saves you the effort of having the Google GeoLocation API run every time you import items. You can take this table and as a sql file, reimport the table. If already have this table saved as a sql file, skip the next part. If you do not have this file already, (assuming you're a linux/mac terminal) run the following command:

ssh portal@Your_old_portal_instance 'docker exec -tu postgres spotlight_db_1 pg_dump spotlight_production -t geolocation' > geolocation.sql

The file should now be on your computer. Next, run the following command:

cat geolocation.sql | ssh portal@Your_new_portal_instance 'docker exec -i spotlight_db_1 /usr/bin/psql -h localhost -U postgres spotlight_production'

Done.

If this doesn't work for you or you don't have a native shell with ssh (i.e. Windows), the relevant commands are:

  • Login to your existing portal

  • Run: docker exec -tu postgres spotlight_db_1 pg_dump spotlight_production -t geolocation > geolocation.sql

  • Login and copy your sql file to your new server

  • Run: cat geolocation.sql | docker exec -i spotlight_db_1 /usr/bin/psql -h localhost -U postgres spotlight_production

Running a Data Import

After you have finished updating your data-config.xml file, continue the Solr installation instructions. Navigate to the Solr web-interface, select your core on the left and select the DataImport section. Select full-import and select Execute. When complete, the indexer should display its status and success or error message. If there are any errors with the mapping, the readout will look something like this:

{
  "responseHeader": {
    "status": 0,
    "QTime": 0
  },
  "initArgs": [
    "defaults",
    [
      "config",
      "data-config.xml"
    ]
  ],
  "command": "status",
  "status": "idle",
  "importResponse": "",
  "statusMessages": {
    "Total Requests made to DataSource": "0",
    "Total Rows Fetched": "1000",
    "Total Documents Processed": "0",
    "Total Documents Skipped": "0",
    "Full Dump Started": "2018-06-29 19:36:29",
    "Total Documents Failed": "1000",
    "Time taken": "0:3:2.107",
}
}