Skip to content

Useful SQL Queries

Peter Mangiafico edited this page Mar 4, 2019 · 4 revisions

Dump identifiers for approved publications to a CSV file:

DOIs:

mysql -e "select identifier_value from publication_identifiers pi inner join contributions c on pi.publication_id = c.publication_id where pi.identifier_type='doi' and pi.certainty='confirmed' and c.status='approved'" -u pub -p sul-pub > 'stanford-dois.csv'

Total approved publications:

select count(distinct(publication_id)) from contributions where status='approved';

Recent contributions for publications with a WoS UID and a PubMed ID:

select * from contributions, publications
where publications.pmid is not null and
      publications.wos_uid is not null and
      contributions.publication_id = publications.id
order by publications.created_at desc, contributions.author_id asc
limit 30

Recent contributions for publications with a WosUID and no PubMed ID:

select * from contributions, publications
where publications.pmid is null and
      publications.wos_uid is not null and
      contributions.publication_id = publications.id
order by publications.created_at desc, contributions.author_id asc
limit 30

Authors with many alternate identities

Find authors with a lot of institution data by looking for how many start_date values they have:

SELECT *, count(*) as count
FROM author_identities
WHERE start_date is not NULL
GROUP BY author_id
ORDER BY count desc

Authors with common last names

To find some common last names:

SELECT preferred_last_name, count(preferred_last_name) as 'num'
FROM authors
WHERE preferred_last_name is not null
GROUP BY preferred_last_name
ORDER BY count(preferred_last_name) desc
LIMIT 100

To find authors with a common last name who have many approved contributions:

SELECT a.preferred_first_name, a.preferred_middle_name, a.preferred_last_name, count(c.id) as 'seed_count'
FROM authors as a LEFT JOIN contributions as c
ON a.id=c.author_id
WHERE a.preferred_last_name = 'Wang'
AND c.status = 'approved'
GROUP BY a.id
ORDER BY count(c.id) desc
LIMIT 100