Where the score is the same, actives will rank higher than inactives #5

baoilleach · 2017-12-19T13:49:47Z

In the course of using this benchmark, I just recently noticed a small error, regarding the line:

benchmarking_platform/scoring/data_sets_I/calculate_scored_lists.py

Line 179 in f37fc62

scores[fp].append(sorted(single_score[fp], reverse=True))

...which occurs in several similarly-named Python scripts.

Since single_score[fp] is a tuple of (simscore, id, active/inactive), it does indeed rank first by similarity, but then it ranks by Id, and the actives have Ids with 'A' in them instead of 'D' for the decoys, and so rank higher (when the similarity is the same). However, even just sorting by the similarity is not sufficient to avoid this problem, as Python sort is a stable sort, and the actives are added to the list first, and so will always occur ahead of the decoys. In other words, a random shuffle is needed first. Here is a potential fix:

# random.seed(1) at the top of the file
random.shuffle(single_score[fp])
scores[fp].append(sorted(single_score[fp], reverse=True, key=lambda x:x[0]))

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Where the score is the same, actives will rank higher than inactives #5

Where the score is the same, actives will rank higher than inactives #5

baoilleach commented Dec 19, 2017

Where the score is the same, actives will rank higher than inactives #5

Where the score is the same, actives will rank higher than inactives #5

Comments

baoilleach commented Dec 19, 2017