Skip to content

Web Of Science API Queries

Darren L. Weber, Ph.D edited this page Oct 9, 2017 · 8 revisions

See related evaluation of how the WoS API meets our query use-cases:

Find DOI Records

The WoS API doc (ver 3.0, July 7, 2015) contains:

  • p. 29, has Data Citation Index field DO=DOI
  • p. 33 has SciELO Citation Index with the same field
  • p. 34 has Web of Science Core Collection with the DO=DOI field
  • p. 53 has a couple of things with a DOI field and xpath:
    • Book Digital Object Identifier (DOI) .../dynamic_data/cluster_related/identifiers
    • Digital Object Identifier (DOI) .../dynamic_data/cluster_related/identifiers

The WoS technical support advisor indicates that the DO field queries are:

  • partial string matches
  • case insensitive

This presents problems for accurate matching of DOI identifier strings. See related issues and work in

Added a DOI search option to the WosQueries object to test this search option, e.g.

wos_client = WosClient.new(Settings.WOS.AUTH_CODE, :debug);
wos_queries = WosQueries.new(wos_client);

dois = PublicationIdentifier.where(identifier_type: 'DOI').limit(200).sample(10).map do |pub_id|
  pub_id[:identifier_uri] || pub_id[:identifier_value]
end
doi_records = dois.compact.map {|doi| sleep(1); wos_queries.search_by_doi(doi) }
# prune the results that failed to match a DOI
doi_records = doi_records.reject {|rec| rec.empty? }

Then extract identifiers from the records, when they are available, e.g.

doi_records.map {|rec| rec.doc.search('identifier').map {|id| "#{id['type']}: #{id['value']}" } }
=> [
["issn: 0002-7863", "doi: 10.1021/ja310831m"], 
["issn: 0091-7613", "doi: 10.1130/G31449.1", "xref_doi: 10.1130/G31449.1"], 
["issn: 0002-8703", "eissn: 1097-5330", "doi: 10.1016/j.ahj.2013.05.024"], 
["issn: 0278-7407", "art_no: ARTN TC2006", "doi: 10.1029/2007TC002172"]]

Find PMID Records

A record for Russ Altman contains a WoS UID - <UID>MEDLINE:24551397</UID> - and it has some dynamic data fields that indicate this UID is a PMID, i.e.

  <dynamic_data>
    <citation_related>
      <tc_list>
        <silo_tc coll_id="MEDLINE" local_count="0"/>
      </tc_list>
    </citation_related>
    <cluster_related>
      <identifiers>
        <identifier type="eissn" value="1942-597X"/>
        <identifier type="pmid" value="MEDLINE:24551397"/>
      </identifiers>
    </cluster_related>
  </dynamic_data>

Find a record with a PMID in the rails console:

> pub_ids = PublicationIdentifier.select('DISTINCT publication_id').where(identifier_type: 'PMID').limit(1).first
=> #<PublicationIdentifier:0x005605437947c8 id: nil, publication_id: 1>
> Publication.find(pub_ids.publication_id).pmid
=> 10000166

Assume this PMID value can be a MEDLINE-UID in the new WoS SOAP API, i.e. MEDLINE:{PMID}. Then on the wos-queries branch (PR #223), we can try to retrieve the record in the rails console.

wos_queries = WosQueries.new(WosClient.new(Settings.WOS.AUTH_CODE, :debug), 'MEDLINE')
records = wos_queries.retrieve_by_id('MEDLINE:10000166')

This constructs a query that includes:

    <woksearch:retrieveById>
      <databaseId>MEDLINE</databaseId>
      <uid>MEDLINE:10000166</uid>
      <queryLanguage>en</queryLanguage>

It failed to find that record. It doesn't matter if the default WOK database is used either. The response includes:

        <recordsFound>0</recordsFound>
        <recordsSearched>27482575</recordsSearched>

However, when I use the PMID/MEDLINE UID from the comment above, it works!

records = wos_queries.retrieve_by_id('MEDLINE:24551397')

The response includes the metadata and the record data too:

        <queryId>3</queryId>
        <recordsFound>1</recordsFound>
        <recordsSearched>27482575</recordsSearched>
        <optionValue>
          <label>RecordIDs</label>
          <value>MEDLINE:24551397</value>

Trying to sample a few PMID from the sul_pub prod-db to see if any can be found on the new SOAP-API (note that the iteration must include a sleep(1) to avoid hitting throttle errors):

pmids = PublicationIdentifier.where(identifier_type: 'PMID').limit(50).sample(10).map(&:identifier_value)
#=> ["10002407", "1001090", "10007847", "10009788", "10012482", "10014304", "10013432", "10013714", "10012537", "10014322"]
pmids_found = pmids.map do |pmid|
  sleep(1)
  [pmid, wos_queries.retrieve_by_id("MEDLINE:#{pmid}").count > 0 ]
end

The hit rate for that PMID search might be low, maybe 50%, e.g. a few runs of those queries:

[["10002407", false], ["1001090", true], ["10007847", false], ["10009788", false], 
["10012482", false], ["10014304", false], ["10013432", false], ["10013714", false],
["10012537", false], ["10014322", false]]
[["10021829", true], ["10021418", true], ["10010525", false], ["10021470", true],
["10014084", false], ["10018836", false], ["10013226", false], ["10029025", true],
["10019707", false], ["10022419", true]]
[["10021829", true], ["10021418", true], ["10010525", false], ["10021470", true], 
["10014084", false], ["10018836", false], ["10013226", false], ["10029025", true], 
["10019707", false], ["10022419", true]]
To get PMIDs for WoS Records by WosID

First, let's retrieve a publication record by the WosID:

wos_id = '000070953800034'
PublicationIdentifier.where(identifier_type: 'WosItemID', identifier_value: wos_id)
wos_client = WosClient.new(Settings.WOS.AUTH_CODE, :debug);
wos_queries = WosQueries.new(wos_client);
records = wos_queries.retrieve_by_id("WOS:#{wos_id}")
records.print # view the record XML
# that works

It seems the only REC data that contains additional identifiers is in

  • xpath: //dynamic_data/cluster_related/identifiers
  • e.g. as in comment above
<dynamic_data>
    <cluster_related>
      <identifiers>
        <identifier type="eissn" value="1942-597X"/>
        <identifier type="pmid" value="MEDLINE:24551397"/>
      </identifiers>
    </cluster_related>
  </dynamic_data>

Locating existing sul_pub records for new WoS records

wos_client = WosClient.new(Settings.WOS.AUTH_CODE, :info);
wos_queries = WosQueries.new(wos_client);
records = wos_queries.search_by_name('Altman, Russ');
records.count  # => 464

Trying to find these records in the prod-db:

wos_ids = records.uids.map {|uid| ids = uid.split(':'); ids.last if ids.first == 'WOS' }.compact;
wos_ids.count  #=> 411
wos_items = wos_ids.map {|wos| PublicationIdentifier.where(identifier_type: 'WoSItemID', identifier_value: wos).first }.compact;
wos_items.count  #=> 342 of 411 found
# For PMID
pmids = records.uids.map {|uid| ids = uid.split(':'); ids.last if ids.first == 'MEDLINE' }.compact;
pubmed_items = pmids.map {|pmid| PublicationIdentifier.where(identifier_type: 'PMID', identifier_value: pmid).first }.compact;
pmids.count  #=> 53
pubmed_items.count  #=> 51 of 53 found