forked from oharac/bd_chi
-
Notifications
You must be signed in to change notification settings - Fork 0
/
mining_check.Rmd
696 lines (554 loc) · 30.1 KB
/
mining_check.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
---
title: 'Calculate fishing stressor sensitivities from IUCN impact scores'
author: "*Compiled on `r date()` by `r Sys.info()['user']`*"
output:
html_document:
code_folding: hide
toc: true
toc_depth: 3
toc_float: yes
number_sections: true
theme: cerulean
highlight: haddock
includes:
in_header: '~/github/src/templates/ohara_hdr.html'
pdf_document:
toc: true
---
``` {r setup, echo = TRUE, message = FALSE, warning = FALSE}
knitr::opts_chunk$set(fig.width = 6, fig.height = 4, fig.path = 'figs/',
echo = TRUE, message = FALSE, warning = FALSE)
source('https://raw.githubusercontent.com/oharac/src/master/R/common.R')
source(here::here('common_fxns.R'))
library(tidytext) ### to help parse ngrams in the narratives
```
# Summary
Stressor sensitivity is a very simplified version of stressor weights (see draft1 folder for complicated version). Here, we will simply note whether a given species is sensitive to a given stressor, using the IUCN impact weights (i.e. non-negligible impact score), ranked as 0-3 (no, low, medium, high impact) for later differentiation if necessary. For species with NA impact scores, we can imply sensitivity based on other fields, e.g. severity. This version of sensitivity is not related to the range of exposure, as the weight calculation was doing.
Sensitivity will denote whether a species range overlapping a stressor range constitutes an impact.
# Methods
## Set up stressor sensitivity dataframe
Using all listed threats and the threat-to-stressor lookup, and focusing on those species with valid maps (i.e. a csv file in `spp_rasts_mol_2019`), identify species with non-negligible impacts from impact score, and where impact is unknown, based on severity field. Here we are looking at fishing threats. To differentiate between pelagic low/high bycatch, demersal destructive, and demersal non-destructive low/high bycatch, we will mine the species narratives for gear types, and use habitat information to locate the species in the water column.
In cases where a species is impacted by multiple threats that map to a single stressor (e.g. different mentions of pelagic high bycatch gear in different threat categories) we summarize to the _maximum_ score out of all listed. For example, a species suffers low impact from unintentional impacts from high bycatch pelagic fishing (score = 3) but medium impact from intentional use (score = 5), the high bycatch pelagic fishing stressor layer will be weighted as a 5 across the board.
In this step we will include all species including LC and DD. We will however limit to comprehensively assessed taxa.
Threats from IUCN mapped to fishing stressors:
| Code | Description | Potential strs |
|:-----:|:----------------------------------------------|:--------------------|
| 5.4 | Biological resource use: Fishing & harvesting aquatic resources | NA |
| 5.4.1 | Intentional use: subsistence/small scale (species being assessed is the target) | art_fish |
| 5.4.2 | Intentional use: large scale (species being assessed is the target) | dem_dest, dem_nondest_hb, dem_nondest_lb, pel_hb, pel_lb |
| 5.4.3 | Unintentional effects: subsistence/small scale (species being assessed is not the target) | art_fish |
| 5.4.4 | Unintentional effects: large scale (species being assessed is not the target) | dem_dest, dem_nondest_hb, dem_nondest_lb, pel_hb, pel_lb |
| 5.4.5 | Persecution/control | NA |
| 5.4.6 | Motivation Unknown/Unrecorded | NA |
```{r set up spp with maps and threats}
spp_marine <- read_csv(here('_data', sprintf('spp_marine_from_api_%s.csv', api_version)))
spp_maps <- read_csv(here('_data', sprintf('spp_marine_maps_%s.csv', api_version)))
spp_risk <- read_csv(here('_data', sprintf('iucn_risk_current_%s.csv', api_version))) %>%
select(iucn_sid, cat_score)
spp_comp <- read_csv(here('_data', sprintf('iucn_comp_assessed_%s.csv', api_version)))
chi_lookup <- read_csv(here('_raw/iucn_threat_to_stressor_lookup.csv')) %>%
mutate(stressor = str_split(stressor, ';')) %>%
unnest(stressor) %>%
filter(str_detect(stressor, '[a-z]')) %>%
filter(category == 'fishing')
spp_threats <- read_csv(file.path(dir_bd_anx, 'iucn/threats',
sprintf('iucn_spp_threats_%s.csv', api_version))) %>%
left_join(chi_lookup, by = 'code') %>%
filter(!is.na(code))
### Join the maps to the threats.
### * recategorize sensitivities based on impact score
### * if no impact score, categorize sensitivities based on severity
### * rather than filtering, set NA and past scores to 0
### * this allows non-threatened species to remain in the mix
threats_to_marine <- spp_marine %>%
left_join(spp_risk, by = 'iucn_sid') %>%
left_join(spp_threats, by = 'iucn_sid') %>%
### code low-med-high sensitivity based on impact:
mutate(sens = case_when(score_num %in% 0:2 ~ FALSE, ### no/negligible impact
score_num > 2 ~ TRUE, ### non-negligible impact
TRUE ~ NA)) %>% ### set all else to NA
### for NA impact, code sensitivity based on severity:
mutate(sev = tolower(severity),
sens = case_when(is.na(sens) & str_detect(sev, 'negligible|no decl') ~ FALSE,
is.na(sens) & str_detect(sev, 'very rapid decl') ~ TRUE,
is.na(sens) & str_detect(sev, 'rapid decl') ~ TRUE,
is.na(sens) & str_detect(sev, 'causing|slow, significant') ~ TRUE,
TRUE ~ sens)) %>%
### set past impacts and unresolved sensitivies to 0:
mutate(sens = ifelse(timing == 'Past, Unlikely to Return', FALSE, sens),
sens = ifelse(is.na(sens), FALSE, sens)) %>%
filter(!is.na(code)) %>% ### drop code mismatches
filter(!is.na(stressor)) %>% ### drop codes that don't match stressors
select(iucn_sid, cat_score, sens,
code, stressor, score_num) %>%
distinct()
### sensitivity categories:
# [1] "Unknown" "Negligible declines"
# [3] NA "No decline"
# [5] "Rapid Declines" "Slow, Significant Declines"
# [7] "Causing/Could cause fluctuations" "Very Rapid Declines"
write_csv(threats_to_marine, here('int/spp_threats_fishing.csv'))
```
``` {r}
n_comp <- spp_marine %>%
filter(iucn_sid %in% spp_maps$iucn_sid) %>%
filter(iucn_sid %in% spp_comp$iucn_sid) %>%
.$iucn_sid %>% unique() %>% length()
n_threatened <- spp_risk %>%
filter(iucn_sid %in% spp_maps$iucn_sid) %>%
filter(iucn_sid %in% spp_comp$iucn_sid) %>%
filter(!is.na(cat_score) & !cat_score %in% c(0, 1)) %>%
.$iucn_sid %>% unique() %>% length()
spp_stressed <- threats_to_marine %>%
filter(iucn_sid %in% spp_maps$iucn_sid) %>%
filter(iucn_sid %in% spp_comp$iucn_sid) %>%
filter(!is.na(cat_score) & !cat_score %in% c(0, 1)) %>%
filter(sens)
n_stressed <- spp_stressed %>%
.$iucn_sid %>% unique() %>% length()
```
Out of the list of `r n_comp` marine species with maps in comprehensively-assessed taxa, `r n_threatened` are neither Least Concern nor Extinct nor Data Deficient.
The resulting number of threatened spp (including Near Threatened) with sensitivity to at least one __fishing__ stressor is: `r n_stressed`.
NOTE: These numbers may include a small handful of spp with effectively zero ocean range, which will not show up in maps, will not affect calculations, etc. These can be found from the impacted area dataframe.
<!-- 6957 comp-assessed mapped species -->
<!-- 1357 spp with category NC to CR -->
<!-- 922 with sensitivity to fishing stressors -->
## Identify stressors impact by gear type
Differentiate sensitivity to fishing effort based on gear type: demersal destructive gear (bottom trawl) and high bycatch (trawl, longline, gillnet).
* intentional use: can be low or high bycatch
* if no high bycatch keywords, assume low bycatch?
* if high bycatch keywords, check also for low bycatch keywords - seine? hand line/pole line?
* unintentional use: assume high bycatch?
Fishing stressor categories by gear type in Watson data, categorized via Halpern et al 2019:
| gear | bycatch | habitat | destructive |
|:--------------------|:-------:|:--------:|:-----------:|
| dredge | high | demersal | yes |
| gillnet | high | dem/pel | no |
| purse seine nontuna | low | pelagic | no |
| purse seine tuna | low | pelagic | no |
| lines non tuna | low | dem/pel | no |
| pole-line tuna | low | pelagic | no |
| longline nontuna | high | dem/pel | no |
| longline tuna | high | pelagic | no |
| trap | high | demersal | no |
| trawl midwater | high | pelagic | no |
| trawl bottom | high | demersal | yes |
```{r set up gear to stressor lookup}
### manually set up the gear types.
gear_to_stressor_df <- tribble(
~gear, ~gear_full, ~bycatch, ~gear_hab, ~dest, ~stressor,
'dredge', 'dredge', 'high', 'dem', T, 'dem_dest',
'gillnet', 'gillnet', 'high', 'dem', F, 'dem_nondest_hb',
'gillnet', 'gillnet', 'high', 'pel', F, 'pel_hb',
'seine', 'purse seine nontuna', 'low', 'pel', F, 'pel_lb',
'seine', 'purse seine tuna', 'low', 'pel', F, 'pel_lb',
'poleline', 'lines non tuna', 'low', 'dem', F, 'dem_nondest_lb',
'poleline', 'lines non tuna', 'low', 'pel', F, 'pel_lb',
'poleline', 'pole-line tuna', 'low', 'pel', F, 'pel_lb',
'longline', 'longline nontuna', 'high', 'dem', F, 'dem_nondest_hb',
'longline', 'longline nontuna', 'high', 'pel', F, 'pel_hb',
'longline', 'longline tuna', 'high', 'pel', F, 'pel_hb',
'trap', 'trap', 'high', 'dem', F, 'dem_nondest_hb',
'midtrawl', 'trawl midwater', 'high', 'pel', F, 'pel_hb',
'btmtrawl', 'trawl bottom', 'high', 'dem', T, 'dem_dest',
'btmtrawl', 'trawl bottom', 'high', 'pel', T, 'dem_dest')
### Note: bottom trawl is listed as impacting pelagic as well, to
### account for a bunch of sharks and rays that are listed only as
### "pelagic" but are noted to be susceptible to bottom trawling pressure.
write_csv(gear_to_stressor_df, here('_raw/gear_to_stressor_lookup.csv'))
```
## Identify species habitat location (demersal vs pelagic)
Habitat codes 10.X are marine pelagic; 9.1 is neritic pelagic. Other 9.X, 11.X, 12.X, 13.X are demersal or benthic.
Note that some species are listed as unknown (18) or other (17) only, which leaves pelagic/demersal status unresolved. When comparing to different gear types, this is problematic, so here we will assume that an unknown or other habitat indicates potential exposure to both demersal and pelagic gear types.
```{r}
spp_dem_pel <- read_csv(file.path(dir_bd_anx, 'iucn',
sprintf('spp_habs_from_api_%s.csv', api_version)),
col_types = cols('iucn_sid' = 'i',
'code' = 'c')) %>%
select(iucn_sid, code, habitat, suitability) %>%
distinct() %>%
filter(iucn_sid %in% spp_marine$iucn_sid) %>% ### limit to marine spp or unknown/other
mutate(code_main = as.integer(str_replace_all(code, '\\..*', ''))) %>%
mutate(dem = code_main %in% c(9, 11, 12, 13) & code != '9.1',
pel = code_main == 10 | code == '9.1') %>%
group_by(iucn_sid) %>%
### check for unresolved pelagic/demersal and unknown/other hab:
mutate(na_check = (sum(pel) + sum(dem) == 0),
pel = ifelse(na_check & code_main %in% 17:18, TRUE, pel),
dem = ifelse(na_check & code_main %in% 17:18, TRUE, dem)) %>%
summarize(pel = any(pel),
dem = any(dem)) %>%
gather(spp_hab, val, pel, dem) %>%
filter(val == TRUE) %>%
select(-val)
write_csv(spp_dem_pel, here('int/spp_demersal_pelagic.csv'))
```
## Check for bycatch and gear instances in narratives
### identify keywords for mid-water and bottom trawls
Since trawl surveys are often used for research, we can avoid instances of "trawl" near "survey" as a pattern; are there any other troublesome patterns? Examine bigrams from the narratives that include 'trawl'.
```{r set up for bigram analysis, eval = TRUE}
spp_narr_file <- file.path(dir_bd_anx, 'iucn',
sprintf('spp_narr_from_api_%s.csv', api_version))
spp_narrs <- read_csv(spp_narr_file,
col_types = cols(.default = 'c', iucn_sid = 'i')) %>%
select(iucn_sid:usetrade) %>%
gather(dimension, text, -iucn_sid) %>%
### remove HTML tags since they seem to affect the unnesting of sentences
mutate(text = str_replace_all(text, '<.*?>', ' '))
### set up some stop words. The stop_words df is a bit large and slow
### to search over... also doesn't drop numbers
cut_words <- data.frame(
word = c('a', 'and', 'or', 'in', 'on', 'of', 'by',
'to', 'but', 'the', 'from',
'[0-9,;\\.]+')
) %>%
mutate(combo = paste0('^', word, ' | ', word, '$')) %>%
### e.g. not "and trawl" or "trawling and"
summarize(to_cut = paste0(combo, collapse = '|'))
```
```{r examine trawl bigrams, eval = TRUE}
trawl_text_df <- spp_narrs %>%
### prefilter to dramatically speed up the unnest_tokens...
filter(str_detect(tolower(text), 'trawl')) %>%
# group_by(iucn_sid, dimension) %>%
unnest_tokens(output = bigram, input = text,
token = 'ngrams', n = 2, drop = FALSE) %>%
filter(str_detect(bigram, 'trawl')) %>%
mutate(pre_gram = ifelse(str_detect(bigram, '^trawl'), TRUE, FALSE))
trawl_bigram_df <- trawl_text_df %>%
select(-text, -iucn_sid) %>%
filter(!str_detect(bigram, cut_words$to_cut)) %>%
group_by(bigram, pre_gram) %>%
summarize(instances = n()) %>%
ungroup() %>%
arrange(desc(instances)) %>%
filter(instances > 1)
DT::datatable(trawl_bigram_df)
```
Combinations to exclude:
* trawl survey
* fishery-independent trawl, exploratory trawl, experimental trawl
* research trawl, scientific trawl
* trawl sample, trawl experiment, trawl research
What kinds of patterns involve non-research trawls, to differentiate between mid-water and bottom? Look at preceding and following words in triads, again using a regex string to identify instances of `[word] trawl[suffix]` or `trawl[suffix] [word]`.
* bottom trawls:
* bottom trawl, deep.?water trawl, demersal trawl, beam trawl, benthic trawl,
* crab trawl, crustacean trawl, groundfish trawl, shrimp trawl
* midwater trawls:
* midwater trawl, pelagic trawl
* indeterminate types?:
* otter trawl
These (exclusion and habitat types) are coded in a regex exception string.
### identify keywords for entanglement and nets
Some gear types, e.g. seines and gillnets, may cause entanglement issues.
```{r examine entangle instances, eval = TRUE}
entangle_text_df <- spp_narrs %>%
### prefilter to dramatically speed up the unnest_tokens...
filter(str_detect(tolower(text), 'entangl')) %>%
unnest_tokens(output = sentence, input = text,
token = 'sentences', drop = TRUE) %>%
# group_by(iucn_sid, dimension) %>%
unnest_tokens(output = word, input = sentence,
token = 'words',
drop = FALSE, collapse = NULL) %>%
filter(str_detect(word, 'entangl')) %>%
mutate(debris = str_detect(sentence, 'debris'),
net = str_detect(sentence, '[^a-z]net[^a-z]|^net[^a-z]|[^a-z]net$'),
trawl = str_detect(sentence, 'trawl'),
seine = str_detect(sentence, 'seine'))
DT::datatable(entangle_text_df %>% select(-word))
entangle_gear_df <- entangle_text_df %>%
gather('gear', 'val', debris:seine) %>%
group_by(gear) %>%
summarize(entangle_counts = sum(val))
knitr::kable(entangle_gear_df)
```
Inspecting these we see nets and anthropogenic debris responsible for entanglements. It seems reasonable to assign entanglement as a keyword for sensitivity to mid- and bottom trawl and gillnets. What about seines? Few instances of that combination in a single sentence, and most of those are "incidental" entanglement rather than attributed to seines.
### identify keywords for pole & line combos
What kinds of patterns involve line, pole-and-line, etc. Look at preceding and following words in triads. Rather than identify instances of tuna vs. non-tuna pole and line, we can simply specify that a given species is sensitive to low bycatch, non-destructive line fishing in whichever habitats the species occurs (pelagic, demersal, both).
```{r trigram analysis for pole and line}
### exclude things like:
### "decline", "longline", "linear", "lineatus", "lineolatus"
line_exclude <- c('[a-z]line|line[ao]|long.?line')
line_text_df <- spp_narrs %>%
### prefilter to dramatically speed up the unnest_tokens...
filter(str_detect(tolower(text), 'line')) %>%
# group_by(iucn_sid, dimension) %>%
unnest_tokens(output = trigram, input = text,
token = 'ngrams', n = 3, drop = FALSE) %>%
filter(str_detect(trigram, 'line') &
!str_detect(trigram, line_exclude))
line_trigram_df <- line_text_df %>%
select(-text, -iucn_sid) %>%
filter(!str_detect(trigram, cut_words$to_cut)) %>%
### will drop "and" (etc) but only at start or end of trigram
group_by(trigram) %>%
summarize(instances = n()) %>%
ungroup() %>%
arrange(desc(instances)) %>%
filter(instances > 1)
DT::datatable(line_trigram_df)
```
* some examples (consider also swapping spaces with hyphens, e.g. [^a-z] or [ -])
* hook(s) (and) line, hand line, troll(ing) line,
* caught by line, caught in line, caught on line, caught with line,
* deep water line, hook and line, pole-and-line, drop-line,
* line fishery, line fisheries, industrial line fishery, inshore line fishing
* commercial line
* exclude: longline (or long line)
### identify keywords for destructive fisheries?
Some species mention threats from destructive fishing practices such as blast/dynamite fishing and cyanide fishing. These are not in our maps of gear types so they will be excluded.
## Parse threats to count by gear type
Once parsed, examine frequency of various gear types. Presumably should be a higher proportion of gears associated with high bycatch - i.e. indiscriminate gear types.
To help differentiate between mid-water trawl and bottom trawl, for species where narratives do not clearly differentiate, we will use habitat information to determine sensitivity.
```{r set up gear keywords}
### create search terms for midwater trawl (separate into "trawl"-specific
### vs keywords that don't involve "trawl")
gears_midtrawl1 <- c('mid.?water trawl', 'pelagic trawl')
gears_midtrawl2 <- c('entangl')
### create search terms for bottom trawl:
gears_btmtrawl1 <- c('bottom trawl', 'deep.?water trawl',
'demersal trawl', 'beam trawl', 'benthic trawl',
'crab trawl', 'crustacean trawl', 'groundfish trawl',
'shrimp trawl', 'prawn trawl', 'scallop trawl')
gears_btmtrawl2 <- c('deep sea shrimp fisher',
'deep sea crustacean fisher',
'deep.?water demersal fisher',
'danish seine', 'entangl')
### create keywords for non-fishery "trawl" instances
trawl_except <- c(
'(explora|experim|research|scientific|fisher[a-z]+.indep)[a-z]* trawl',
'trawl[a-z]* (surv|sample|experiment|research)')
### create search terms for line fishing (but not longline):
gears_line <- c('caught [a-z]+ line',
# caught in/on/with line (won't include longline)
'deep.?water line',
'hook.and.line',
'pole.and.line',
'drop.?line',
'(?<!long.?)line fish',
# line fishery excl longline/long-line, etc
'commercial line') # line fishing/fishery/fisheries
### create search terms for gillnet fishing:
gears_net <- c('gill.?net', 'drift.?net', 'set.?net')
### create regex of gear type search terms
gears_of_interest <- c('by.?catch',
'dredge',
'long.?line',
gears_line,
'seine',
gears_net,
'[^a-z]trap', ### avoid "entrapment"
'trawl',
'entangl',
'fisher') %>%
paste0(collapse = '|')
```
``` {r search narratives for terms}
gear_narr_df <- spp_narrs %>%
gather(dimension, text, -iucn_sid) %>%
### prefilter for speedier unnesting of tokens:
filter(str_detect(tolower(text), gears_of_interest)) %>%
tidytext::unnest_tokens(input = text, output = sentence, token = 'sentences') %>%
### then filter to drop sentences without gear mentions:
filter(str_detect(sentence, gears_of_interest))
### count all the "trawl" instances, and subtract out the non-stressor trawls
trawl_narr_class <- gear_narr_df %>%
mutate(trawl_count = str_count(sentence, 'trawl'),
trawl_count = trawl_count - str_count(sentence, trawl_except))
gear_narr_class <- gear_narr_df %>%
mutate(dredge_count = str_count(sentence, 'dredge'),
longline_count = str_count(sentence, 'long.?line'),
poleline_count = str_count(sentence, paste0(gears_line, collapse = '|')),
seine_count = str_count(sentence, 'seine'), ### note: includes danish seine
trap_count = str_count(sentence, 'trap'),
gillnet_count = str_count(sentence, paste0(gears_net, collapse = '|')),
### trawl counts associated with explicit mention of "trawl":
midtrawl_count1 = str_count(sentence, paste0(gears_midtrawl1, collapse = '|')),
btmtrawl_count1 = str_count(sentence, paste0(gears_btmtrawl1, collapse = '|')),
### trawl counts associated with non-"trawl" descriptions:
midtrawl_count2 = str_count(sentence, paste0(gears_midtrawl2, collapse = '|')),
btmtrawl_count2 = str_count(sentence, paste0(gears_btmtrawl2, collapse = '|'))) %>%
full_join(trawl_narr_class, by = c('iucn_sid', 'dimension', 'sentence')) %>%
### check that the midtrawl and btmtrawl counts ("trawl"-specific) don't exceed the
### total non-scientific/exploratory trawls
mutate(midtrawl_count1 = ifelse(midtrawl_count1 > trawl_count,
trawl_count, midtrawl_count1),
btmtrawl_count1 = ifelse(btmtrawl_count1 > trawl_count,
trawl_count, btmtrawl_count1)) %>%
### if "trawl" counts but no mid- or btm-trawl, assign trawl counts to btm_trawl
### (assume generic trawl means bottom)
mutate(btmtrawl_count1 = ifelse(midtrawl_count1 == 0 & btmtrawl_count1 == 0,
trawl_count, btmtrawl_count1)) %>%
### consolidate the mid-water trawls and bottom trawls
mutate(btmtrawl_count = btmtrawl_count1 + btmtrawl_count2,
midtrawl_count = midtrawl_count1 + midtrawl_count2) %>%
### clean up
select(-trawl_count, -sentence,
-btmtrawl_count1, -btmtrawl_count2,
-midtrawl_count1, -midtrawl_count2)
gear_narr_tidy <- gear_narr_class %>%
gather(gear, count, ends_with('count')) %>%
mutate(gear = str_replace_all(gear, '_count', '')) %>%
group_by(iucn_sid, gear) %>%
summarize(count = sum(count)) %>%
ungroup()
write_csv(gear_narr_tidy, here('int/gear_narr_tidy.csv'))
```
## Combine gear-to-stressors with species IDs and habitats
```{r}
gear_narr_tidy <- read_csv(here('int/gear_narr_tidy.csv'))
spp_dem_pel <- read_csv(here('int/spp_demersal_pelagic.csv'))
gear_to_stressor_lookup <- read_csv(here('_raw/gear_to_stressor_lookup.csv'))
spp_threats_from_iucn <- read_csv(here('int/spp_threats_fishing.csv'))
### identify all the unique fishing stressors
stressors <- spp_threats_from_iucn$stressor %>% unique()
### separate out artisanal fishing threats as these are not gear-based
art_fish_from_iucn <- spp_threats_from_iucn %>%
filter(stressor == 'art_fish') %>%
select(iucn_sid, stressor, sens, code) %>%
distinct()
### for gear-related stressors, bind spp and gear habitats per species,
### and join to the threats
gear_sens_df <- gear_narr_tidy %>%
left_join(spp_dem_pel, by = 'iucn_sid') %>%
left_join(gear_to_stressor_lookup, by = 'gear') %>%
filter(spp_hab == gear_hab | is.na(spp_hab)) %>%
### some spp have NA for spp_hab - assume these can be either hab
left_join(spp_threats_from_iucn, by = c('iucn_sid', 'stressor'))
### Summarize sensitivity to each stressor as any non-zero counts
### within that stressor category. Complete the list so all stressors
### show up for all species, leaving NAs for those added via complete().
gear_sens_summary <- gear_sens_df %>%
group_by(iucn_sid, stressor, code, score_num) %>%
filter(!is.na(sens)) %>%
summarize(sens = sum(count * sens) > 0) %>%
bind_rows(art_fish_from_iucn) %>%
group_by(iucn_sid) %>%
complete(stressor = stressors, fill = list(sens = FALSE)) %>%
ungroup() %>%
mutate(score_num = ifelse(score_num >= 3 & !sens, NA, score_num))
write_csv(gear_sens_summary,
here('int', sprintf('spp_sens_fishing_%s.csv', api_version)))
```
## Check the number of spp sensitive to different types...
```{r}
spp_sens <- gear_sens_summary %>%
filter(iucn_sid %in% spp_maps$iucn_sid) %>%
filter(iucn_sid %in% spp_comp$iucn_sid) %>%
left_join(spp_risk, by = 'iucn_sid') %>%
filter(!is.na(cat_score) & !cat_score %in% c(0, 1)) %>%
filter(sens)
n_spp_sens <- spp_sens %>%
.$iucn_sid %>%
unique() %>%
length()
drop_check_all <- spp_marine %>%
filter(iucn_sid %in% spp_stressed$iucn_sid) %>%
filter(!iucn_sid %in% spp_sens$iucn_sid) %>%
left_join(spp_narrs, by = 'iucn_sid') %>%
left_join(spp_maps, by = 'iucn_sid') %>%
left_join(spp_comp %>% select(-sciname), by = 'iucn_sid') %>%
select(iucn_sid, sciname, dimension, assess_gp, text) %>%
distinct()
drop_check <- drop_check_all %>%
### prefilter to dramatically speed up the unnest_tokens...
filter(dimension != 'habitat') %>%
filter(str_detect(tolower(text), 'fish')) %>%
unnest_tokens(output = sentence, input = text,
token = 'sentences', drop = TRUE) %>%
filter(str_detect(sentence, 'fish'))
```
According to this analysis so far, there are `r n_spp_sens` distinct threatened species (with maps, comprehensively assessed, etc) affected by various fishing stressors. Note that `r n_stressed` species are listed as threatened therefore a few are not being captured via gear type. See here:
`r DT::datatable(drop_check)`
```{r}
spp_per_stressor <- gear_sens_summary %>%
select(-code) %>%
filter(sens) %>%
distinct() %>%
filter(iucn_sid %in% spp_stressed$iucn_sid) %>%
count(stressor) %>%
arrange(n) %>%
mutate(stressor = fct_inorder(stressor))
ggplot(spp_per_stressor, aes(x = stressor, y = n)) +
ggtheme_plot() +
geom_col() +
coord_flip() +
labs(title = 'sensitive spp by stressor',
y = 'number of species affected by stressor')
```
Most species are affected by artisanal fishing.
```{r}
stressor_per_spp <- gear_sens_summary %>%
select(-code) %>%
filter(sens) %>%
distinct() %>%
filter(iucn_sid %in% spp_stressed$iucn_sid) %>%
count(iucn_sid) %>%
mutate(n_txt = as.character(n))
ggplot(stressor_per_spp, aes(x = n_txt)) +
ggtheme_plot() +
geom_bar() +
labs(title = 'stressor sensitivities per spp',
x = 'number of stressors on spp',
y = 'number of species')
```
Most species are affected by only one fishing stressor.
## Check for species listed as a threat but not otherwise captured
```{r}
threats <- read_csv(here('int/spp_threats_fishing.csv')) %>%
rename(thr_sens = sens) %>%
mutate(thr_listed = TRUE)
# iucn_sid; cat_score; code; stressor; thr_sens; thr_listed
gear_sum <- read_csv(here('int', sprintf('spp_sens_fishing_%s.csv', api_version))) %>%
rename(gear_sens = sens)
# iucn_sid; stressor; code; gear_sens
spp_comp <- read_csv(here('_data', sprintf('iucn_comp_assessed_%s.csv', api_version)))
tmp <- threats %>%
left_join(gear_sum, by = c('iucn_sid', 'stressor', 'code')) %>%
group_by(iucn_sid, code) %>%
summarize(n_sens = sum(thr_sens),
n_stressor = sum(gear_sens, na.rm = TRUE)) %>%
ungroup()
tmp_nomatch <- tmp %>%
filter(n_sens >= 1 & n_stressor == 0) %>%
group_by(iucn_sid) %>%
summarize(codes = paste(code, collapse = ', ')) %>%
left_join(spp_comp, by = 'iucn_sid') %>%
filter(!is.na(assess_gp)) %>%
filter(assess_gp != 'sturgeons')
table(tmp_nomatch$assess_gp)
DT::datatable(tmp_nomatch)
```
Note that each of these is listed as being threatened by fishing pressures, but no categorization of gear type was found in the narratives. Inspect each of these to better understand/inform better definition of keywords.
First round: 129 unmatched spp.
* Fixed the `spp_hab == gear_hab` filter by adding `spp_hab == gear_hab | is.na(spp_hab)`
* Added "deep sea (shrimp|crustacean) fisher"(ies) to search term for trawl
* fixed up the "trawl" counting and method to exclude "trawl survey" etc by splitting explicit mentions of "trawl" from other terms without the word "trawl" e.g. "deep water shrimp fishery"
After these, 37 species remain that seem to have significant fishing impacts but not enough detail in the narratives to identify gear types. To be conservative, these will simply be dropped at this point.
```{r}
x <- readxl::read_excel(here('unmatched_fishing_threats.xlsx'))
z <- tmp_nomatch %>%
left_join(spp_narrs, by = 'iucn_sid') %>%
filter(dimension %in% c('threats')) %>%
# filter(dimension %in% c('threats', 'rationale', 'conservationmeasures')) %>%
filter(!is.na(text))
# filter(!iucn_sid %in% x$iucn_sid)
non_threats <- c('no major threats',
'no apparent fishing threats',
'indirect fishing impacts',
'competition with fisheries - not direct',
'destructive',
'NA')
### species manually checked and not yet addressed
y <- x %>% filter(iucn_sid %in% z$iucn_sid) %>%
filter(!is.na(notes)) %>%
filter(!str_detect(notes, paste0(non_threats, collapse = '|')))
### species manually checked and noted as "no specific fishery" - can't infer
### what type of gear is involved
yy <- x %>% filter(iucn_sid %in% z$iucn_sid) %>%
filter(notes == 'no specific fishery')
### non-matched not included in check excel - should be zero
yyy <- z %>% filter(!iucn_sid %in% x$iucn_sid)
DT::datatable(y)
```