-
Notifications
You must be signed in to change notification settings - Fork 3
Web Of Science API Queries
See related evaluation of how the WoS API meets our query use-cases:
The WoS API doc (ver 3.0, July 7, 2015) contains:
- p. 29, has
Data Citation Index
fieldDO=DOI
- p. 33 has
SciELO Citation Index
with the same field - p. 34 has
Web of Science Core Collection
with theDO=DOI
field - p. 53 has a couple of things with a DOI field and xpath:
Book Digital Object Identifier (DOI) .../dynamic_data/cluster_related/identifiers
Digital Object Identifier (DOI) .../dynamic_data/cluster_related/identifiers
The WoS technical support advisor indicates that the DO
field queries are:
- partial string matches
- case insensitive
This presents problems for accurate matching of DOI identifier strings. See related issues and work in
Added a DOI search option to the WosQueries
object to test this search option, e.g.
wos_client = WosClient.new(Settings.WOS.AUTH_CODE, :debug);
wos_queries = WosQueries.new(wos_client);
dois = PublicationIdentifier.where(identifier_type: 'DOI').limit(200).sample(10).map do |pub_id|
pub_id[:identifier_uri] || pub_id[:identifier_value]
end
doi_records = dois.compact.map {|doi| sleep(1); wos_queries.search_by_doi(doi) }
# prune the results that failed to match a DOI
doi_records = doi_records.reject {|rec| rec.empty? }
Then extract identifiers from the records, when they are available, e.g.
doi_records.map {|rec| rec.doc.search('identifier').map {|id| "#{id['type']}: #{id['value']}" } }
=> [
["issn: 0002-7863", "doi: 10.1021/ja310831m"],
["issn: 0091-7613", "doi: 10.1130/G31449.1", "xref_doi: 10.1130/G31449.1"],
["issn: 0002-8703", "eissn: 1097-5330", "doi: 10.1016/j.ahj.2013.05.024"],
["issn: 0278-7407", "art_no: ARTN TC2006", "doi: 10.1029/2007TC002172"]]
A record for Russ Altman contains a WoS UID - <UID>MEDLINE:24551397</UID>
- and it has some dynamic data fields that indicate this UID is a PMID, i.e.
<dynamic_data>
<citation_related>
<tc_list>
<silo_tc coll_id="MEDLINE" local_count="0"/>
</tc_list>
</citation_related>
<cluster_related>
<identifiers>
<identifier type="eissn" value="1942-597X"/>
<identifier type="pmid" value="MEDLINE:24551397"/>
</identifiers>
</cluster_related>
</dynamic_data>
Find a record with a PMID in the rails console:
> pub_ids = PublicationIdentifier.select('DISTINCT publication_id').where(identifier_type: 'PMID').limit(1).first
=> #<PublicationIdentifier:0x005605437947c8 id: nil, publication_id: 1>
> Publication.find(pub_ids.publication_id).pmid
=> 10000166
Assume this PMID value can be a MEDLINE-UID in the new WoS SOAP API, i.e. MEDLINE:{PMID}
. Then on the wos-queries
branch (PR #223), we can try to retrieve the record in the rails console.
wos_queries = WosQueries.new(WosClient.new(Settings.WOS.AUTH_CODE, :debug), 'MEDLINE')
records = wos_queries.retrieve_by_id('MEDLINE:10000166')
This constructs a query that includes:
<woksearch:retrieveById>
<databaseId>MEDLINE</databaseId>
<uid>MEDLINE:10000166</uid>
<queryLanguage>en</queryLanguage>
It failed to find that record. It doesn't matter if the default WOK
database is used either. The response includes:
<recordsFound>0</recordsFound>
<recordsSearched>27482575</recordsSearched>
However, when I use the PMID/MEDLINE UID from the comment above, it works!
records = wos_queries.retrieve_by_id('MEDLINE:24551397')
The response includes the metadata and the record data too:
<queryId>3</queryId>
<recordsFound>1</recordsFound>
<recordsSearched>27482575</recordsSearched>
<optionValue>
<label>RecordIDs</label>
<value>MEDLINE:24551397</value>
Trying to sample a few PMID from the sul_pub prod-db to see if any can be found on the new SOAP-API (note that the iteration must include a sleep(1)
to avoid hitting throttle errors):
pmids = PublicationIdentifier.where(identifier_type: 'PMID').limit(50).sample(10).map(&:identifier_value)
#=> ["10002407", "1001090", "10007847", "10009788", "10012482", "10014304", "10013432", "10013714", "10012537", "10014322"]
pmids_found = pmids.map do |pmid|
sleep(1)
[pmid, wos_queries.retrieve_by_id("MEDLINE:#{pmid}").count > 0 ]
end
The hit rate for that PMID search might be low, maybe 50%, e.g. a few runs of those queries:
[["10002407", false], ["1001090", true], ["10007847", false], ["10009788", false],
["10012482", false], ["10014304", false], ["10013432", false], ["10013714", false],
["10012537", false], ["10014322", false]]
[["10021829", true], ["10021418", true], ["10010525", false], ["10021470", true],
["10014084", false], ["10018836", false], ["10013226", false], ["10029025", true],
["10019707", false], ["10022419", true]]
[["10021829", true], ["10021418", true], ["10010525", false], ["10021470", true],
["10014084", false], ["10018836", false], ["10013226", false], ["10029025", true],
["10019707", false], ["10022419", true]]
First, let's retrieve a publication record by the WosID:
wos_id = '000070953800034'
PublicationIdentifier.where(identifier_type: 'WosItemID', identifier_value: wos_id)
wos_client = WosClient.new(Settings.WOS.AUTH_CODE, :debug);
wos_queries = WosQueries.new(wos_client);
records = wos_queries.retrieve_by_id("WOS:#{wos_id}")
records.print # view the record XML
# that works
It seems the only REC
data that contains additional identifiers is in
- xpath:
//dynamic_data/cluster_related/identifiers
- e.g. as in comment above
<dynamic_data>
<cluster_related>
<identifiers>
<identifier type="eissn" value="1942-597X"/>
<identifier type="pmid" value="MEDLINE:24551397"/>
</identifiers>
</cluster_related>
</dynamic_data>
wos_client = WosClient.new(Settings.WOS.AUTH_CODE, :info);
wos_queries = WosQueries.new(wos_client);
records = wos_queries.search_by_name('Altman, Russ');
records.count # => 464
Trying to find these records in the prod-db:
wos_ids = records.uids.map {|uid| ids = uid.split(':'); ids.last if ids.first == 'WOS' }.compact;
wos_ids.count #=> 411
wos_items = wos_ids.map {|wos| PublicationIdentifier.where(identifier_type: 'WoSItemID', identifier_value: wos).first }.compact;
wos_items.count #=> 342 of 411 found
# For PMID
pmids = records.uids.map {|uid| ids = uid.split(':'); ids.last if ids.first == 'MEDLINE' }.compact;
pubmed_items = pmids.map {|pmid| PublicationIdentifier.where(identifier_type: 'PMID', identifier_value: pmid).first }.compact;
pmids.count #=> 53
pubmed_items.count #=> 51 of 53 found