Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use open-source Snorkel to create labelling functions to expand our training dataset. #65

Open
bruffridge opened this issue Jul 2, 2021 · 2 comments
Assignees

Comments

@bruffridge
Copy link
Member

bruffridge commented Jul 2, 2021

https://www.snorkel.org/get-started/

https://github.com/snorkel-team/snorkel

Many of our labelling functions will use MAG topics. For this we will use the free MAG APIs 'evaluate' method. I will provide an API key for this.

‘Evaluate’ method ’try it out'
https://msr-apis.portal.azure-api.net/docs/services/academic-search-api/operations/565d753be597ed16ac3ffc03?
API limits
10,000 transactions per month, 3 per second for interpret, 1 per second for evaluate, 6 per minute for calcHistogram.

API Documentation
https://docs.microsoft.com/en-us/academic-services/project-academic-knowledge/reference-query-expression-syntax
https://docs.microsoft.com/en-us/academic-services/project-academic-knowledge/reference-evaluate-method

List of Microsoft Academic topics
https://academic.microsoft.com/topics/100858432?fullPath=false

Example API request to get ids, dois, titles, abstract, topics, authors, venue, and references labelled with 'biology' OR in the Biomimetics journal AND labelled with 'wind stress' OR 'wind engineering'. This would be used for a labelling function for 'protect from wind'.

https://api.labs.cognitive.microsoft.com/academic/v1.0/evaluate?expr=And(Or(Composite(F.FN=='wind stress'),Composite(F.FN=='wind engineering')),Or(Composite(J.JN=='biomimetics'), Composite(F.FN=='biology')))&model=latest&count=10&offset=0&attributes=Id,DOI,Ti,VFN,F.FN,AA.AuId,AW,RId
@bruffridge
Copy link
Member Author

bruffridge commented Jul 2, 2021

Labelling functions for labels with < 10 examples

Sense temperature cues: ('biology' OR journal:'biomimetics') AND ('temperature measurement' OR 'temperature monitoring' OR 'temperature sensing')
Send light signals in the non-visible spectrum: ('biology' OR journal:'biomimetics') AND 'ultraviolet'
Capture energy: ('biology' OR journal:'biomimetics') AND 'energy harvesting'
Protect from wind: ('biology' OR journal:'biomimetics') AND ('wind stress' OR 'wind engineering')
Detox/purify: ('biology' OR journal:'biomimetics') AND 'detoxification'
Break down structure: ('biology' OR journal:'biomimetics') AND ('breakup' OR 'breakage')
Modify/convert electrical energy: ('biology' OR journal:'biomimetics') AND ('electrochemical energy conversion' OR 'thermoelectric energy conversion')
Absorb and/or filter gases: ('biology' OR journal:'biomimetics') AND ('air filter' OR 'air filtration')
Camouflage/mimicry: ('biology' OR journal:'biomimetics') AND ('camouflage' OR 'mimicry')
Differentiate signal from noise: ('biology' OR journal:'biomimetics') AND ('signal differentiation' OR 'noise (signal processing)')
Prevent fatigue: ('biology' OR journal:'biomimetics') AND ('fatigue resistance' OR 'frictional resistance')
Protect from radiation: ('biology' OR journal:'biomimetics') AND ('radiation protection' OR 'radiation resistance' OR 'radiation tolerance' OR 'radiation resistant' OR 'radiation-protective agents' OR 'radiation inactivation')
Absorb and/or filter liquids: ('biology' OR journal:'biomimetics') AND ('absorbance' OR 'filtration') AND ('water content' OR 'water treatment' OR 'water quality' OR 'water pollution' OR 'water supply' OR 'seawater' OR 'wastewater' OR 'groundwater' OR 'tap water' OR 'raw water' OR 'soil water' OR 'river water' OR 'fresh water' OR 'surface water' OR 'brackish water' OR 'distilled water' OR 'portable water purification' OR 'coagulation (water treatment)' OR 'sedimentation (water treatment)' OR 'water column' OR 'liquid medium')
Protect from gases: ('biology' OR journal:'biomimetics') AND ()
Send electrical/magnetic signals: 'biology' AND ()
Sense motion: 'biology' AND ()
Chemically break down inorganic compounds: 'biology' AND ()
Expel gases: 'biology' AND ()
Send sound signals: 'biology' AND ()
Sense spatial awareness/balance/orientation: 'biology' AND ()
Manage drag/turbulence: 'biology' AND ()
Protect from fire: 'biology' AND ()
Store gases: 'biology' AND ()
Absorb and/or filter solids: 'biology' AND ()
Modify/convert magnetic energy: 'biology' AND ()
Send vibratory signals: 'biology' AND ()
Chemically assemble inorganic compounds: 'biology' AND ()
Compete within or between species: 'biology' AND ()
Cooperate within or between species: 'biology' AND ()
Manage environmental disturbance in a community: 'biology' AND ()
Self-replicate: 'biology' AND ()
Send chemical signals: 'biology' AND ()
Send tactile signals: 'biology' AND ()
Sense chemicals: 'biology' AND ()
Sense disease: 'biology' AND ()
Store solids: 'biology' AND ()
Not biomimicry: ?

@bruffridge
Copy link
Member Author

Another idea for a labelling function is to take our ground truth labelled papers, and use the MAG API to find papers related to each one. Each related paper gets assigned the same label as the ground truth paper it's related to.

Here's how to get related papers from MAG (kudos to @dsmith111 for figuring this out!):

Passing the paper's Id into this link: https://academic.microsoft.com/api/entity/***?entityType=2
(Where *** is the Id) dumps all of the related paper Ids in a nice JSON format.

Example link:
https://academic.microsoft.com/api/entity/3012421327?entityType=2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants