Update the call to canonical_target_name() in parse_all.py #26

stevenlujpl · 2021-06-22T21:25:50Z

The canonical_target_name() function in utils.py accepts 4 parameters, but only 1 parameter is passed to the call to canonical_target_name() in parse_all.py.

parser-indexer-py/src/parserindexer/parse_all.py

Lines 140 to 158 in 3f4a084

    
           cont = { 
        
               'label': 'contains',  # also stored as 'type' 
        
               # target_names (list), cont_names (list) 
        
               'target_names': [canonical_target_name(ex[0]['word'])], 
        
               'cont_names':   [canonical_name(ex[1]['word'])], 
        
               # target_ids (list), cont_ids (list)  
        
               # - p_id prepended in indexer.py 
        
               'target_ids': ['%s_%d_%d' % (ex[0]['ner'].lower(), 
        
                       ex[0]['characterOffsetBegin'], 
        
                       ex[0]['characterOffsetEnd'])], 
        
               'cont_ids': ['%s_%d_%d' % (ex[1]['ner'].lower(), 
        
                       ex[1]['characterOffsetBegin'], 
        
                       ex[1]['characterOffsetEnd'])], 
        
               # excerpt_t (sentence) 
        
               'sentence': ' '.join([t['originalText'] for \ 
        
                                     t in ex[2]['tokens']]), 
        
               # source: 'corenlp' (later, change to 'jsre') 
        
               'source': 'corenlp', 
        
           }

parser-indexer-py/src/parserindexer/utils.py

Lines 138 to 161 in 3f4a084

    
           def canonical_target_name(name, id, targets, aliases): 
        
               """ 
        
               Gets canonical target name 
        
               :param name - name whose canonical name is to be looked up 
        
               :return canonical name 
        
               """ 
        
               name = name.strip() 
        
               # Look up 'name' in the aliases; if found, replace with its antecedent 
        
               # Note: this is super permissive.  Exact match on id is safe, 
        
               # but we're also allowing any exact-text match with any other  
        
               # known target name. 
        
               all_targets = [t['annotation_id_s'] for t in targets  
        
                              if t['name'] == name] 
        
               name_aliases = [a['arg2_s'] for a in aliases  
        
                               if ((a['arg1_s'] == id) or  
        
                                   (a['arg1_s'] in all_targets))] 
        
               if len(name_aliases) > 0: 
        
                   # Ideally there is only one; let's use the first one 
        
                   can_name = [t['name'] for t in targets \ 
        
                                   if t['annotation_id_s'] == name_aliases[0]] 
        
                   print('Mapping <%s> to <%s>' % (name, can_name[0])) 
        
                   name = can_name[0] 
        
               return re.sub(r"[\s_-]+", " ", name).title().replace(' ', '_')

The text was updated successfully, but these errors were encountered:

stevenlujpl · 2021-08-12T01:37:42Z

@wkiri It seems there are two canonical_target_name() functions. One is in the utils.py script of the parser-indexer repo, and the other one is in the name_utils.py of the MTE repo.

The function in the MTE repo is easy to follow, but I am not sure if I fully understand the intention of the function in the parser-indexer repo. I also don't see how to prepare inputs to call the function in the parser-index repo (specifically for the targets and aliases parameters).

I think the function in the parser-indexer repo may be outdated and should be replaced with the one from the MTE repo. Could you please help take a look?

wkiri · 2021-08-18T16:44:58Z

@stevenlujpl It looks to me like the additional arguments were added to allow target matching when known aliases were present. It seems that this is only used (and was probably motivated by) brat_ann_indexer.py, which reads .ann files and stores them in Solr. Since we are no longer using Solr, I think this entire file is deprecated - for MTE purposes at least.

This raises a larger question. The same comment about outdated Solr capabilities applies to csvindexer.py, indexer.py, and solr.py, all of which were set up with Solr infrastructure. Now that we've moved to SQLite, perhaps we should transition (copy?) the current *_parser.py files and json2brat.py back into the main MTE repository and remove the dependency on the parser-indexer repository. These files output JSON, without Solr involved. This would also allow the parsing files to access the same name_utils.py that is in the MTE repo without duplication. Please share your thoughts on this.

stevenlujpl self-assigned this Jun 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update the call to canonical_target_name() in parse_all.py #26

Update the call to canonical_target_name() in parse_all.py #26

stevenlujpl commented Jun 22, 2021

stevenlujpl commented Aug 12, 2021

wkiri commented Aug 18, 2021

Update the call to canonical_target_name() in parse_all.py #26

Update the call to canonical_target_name() in parse_all.py #26

Comments

stevenlujpl commented Jun 22, 2021

stevenlujpl commented Aug 12, 2021

wkiri commented Aug 18, 2021