Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where the score is the same, actives will rank higher than inactives #5

Open
baoilleach opened this issue Dec 19, 2017 · 0 comments
Open

Comments

@baoilleach
Copy link

In the course of using this benchmark, I just recently noticed a small error, regarding the line:

scores[fp].append(sorted(single_score[fp], reverse=True))

...which occurs in several similarly-named Python scripts.

Since single_score[fp] is a tuple of (simscore, id, active/inactive), it does indeed rank first by similarity, but then it ranks by Id, and the actives have Ids with 'A' in them instead of 'D' for the decoys, and so rank higher (when the similarity is the same). However, even just sorting by the similarity is not sufficient to avoid this problem, as Python sort is a stable sort, and the actives are added to the list first, and so will always occur ahead of the decoys. In other words, a random shuffle is needed first. Here is a potential fix:

# random.seed(1) at the top of the file
random.shuffle(single_score[fp])
scores[fp].append(sorted(single_score[fp], reverse=True, key=lambda x:x[0]))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant