-
Notifications
You must be signed in to change notification settings - Fork 3
/
process.txt
35 lines (23 loc) · 1.33 KB
/
process.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
* run software mention service:
cd ~/grobid/software-mentions
./gradlew run
* start a mongo instance if not automatically started
sudo service mongod start
* when the db is filled (here mongo db called softcite-cord19-scibert), we can check the relevant collection with the mongo console
mongo
use softcite-cord19-scibert
db.references.count()
db.documents.count()
db.annotations.count()
db.annotations.aggregate( [ {$unwind: "$software-name.normalizedForm"}, {$sortByCount: "$software-name.normalizedForm" } ] )
* install/run the client
source env/bin/activate
python3 software_mentions_client.py --data-path /media/lopez/store/cord-19/data/ --config my_config.json
* results are stored in mongodb but also written along with the full texts
> find /media/lopez/store/cord-19/data/ -name *.software.json | wc -l
to clean:
> find /media/lopez/store/cord-19/data3/ -name "*.software.json" -type f -delete
* export from mongodb and further ingestion in the KB:
> mongoexport -d softcite-cord19-scibert -c annotations -o /media/lopez/store/cord-19/scibert-db2/softcite-cord19-annotations.json
> mongoexport -d softcite-cord19-scibert -c documents -o /media/lopez/store/cord-19/scibert-db2/softcite-cord19-documents.json
> mongoexport -d softcite-cord19-scibert -c references -o /media/lopez/store/cord-19/scibert-db2/softcite-cord19-references.json