Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a DOI index in VuFind #88

Open
dbietila opened this issue Dec 4, 2019 · 4 comments
Open

Create a DOI index in VuFind #88

dbietila opened this issue Dec 4, 2019 · 4 comments
Assignees

Comments

@dbietila
Copy link

dbietila commented Dec 4, 2019

Create an index of DOIs in VuFind. DOIs occur in some 856 fields, and in 024 fields.

We are interested in indexing 024|a fields for 024 cases where 024|2 is equal to ‘doi’. We also need to index 856|u fields that contain valid DOIs.

Some records may contain multiple 856|u’s with valid DOIs. Ex: 8883838 . We should index each DOI in this case.

The standard syntax for DOIs can be found in Bib # 2352930, and the value is http://dx.doi.org/10.1787/16812328. In this case, we can still strip the string http://dx.doi.org/. Only the portion starting with “10.*” is needed to retrieve the material.

DOIs in the 856|u can occur in a variety of non-standard syntaxes. Bib # 11761529 has an 856|u with the value http://link.springer.com/10.1007/978-981-10-6026-7 . In this case, 10.1007/978-981-10-6026-7 is the meaningful DOI value.

Bib # 9130371 has an 856|u of http://onlinelibrary.wiley.com/book/10.1029/GM093 . This can be trimmed to 10.1029/GM093

There are regular expressions for filtering valid DOIs available here:
https://www.crossref.org/blog/dois-and-matching-regular-expressions/

We should use a Solr analyzer to similarly trim search terms that are directed to this index.

@dbietila
Copy link
Author

dbietila commented Dec 4, 2019

Some reports of records with DOIs are in this Box folder:
https://uchicago.box.com/s/0ndf9r5699x9kc6x3t9gyq74ots6qlw8

@todolson
Copy link
Member

What kind of index should this be? That is, it seems like an exactly string match on the DOI itself. No text tokenizing or anything, just an exact string match on something like:

10.1007/978-981-10-6026-7

Does that seem correct?

@dbietila
Copy link
Author

That makes sense to me. Matt and Keith are working on an urgent SFX issue, but I'll ask them to review this when they are available.

@seanfilipov
Copy link

I made DOI number searchable in vufind. Test code is on antares.
You can search with DOI number or whole URL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants