Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring code for better performance and code quality #280

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 5 additions & 7 deletions tensorflow_transform/info_theory.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
"""Utilities for information-theoretic preprocessing algorithms."""

import math

import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is a library based on tensorflow, so all of the math operations done below are deferred by constructing the tensorflow graph, and should not be using numpy.
There are tensorflow alternatives for these operations though.

Were you able to run the tests in this repo to make sure that they don't break with the change?

Copy link
Author

@maldil maldil Jun 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your response. I'm new to this repository. Before adding the new import, I checked to see if this repo makes use of numpy APIs (https://github.com/tensorflow/transform/search?q=numpy). After confirming that , I automatically analyzed the code to identify any missed chances to use the APIs. Please close the PR if this script is not intended to use numpy APIs. Regarding test. No I did not. Now I'm trying to run the test. However, I am having difficulty locating the test scripts. Any assistance is much appreciated.

# math.log2 was added in Python 3.3
log2 = getattr(math, 'log2', lambda x: math.log(x, 2))

Expand Down Expand Up @@ -52,12 +52,10 @@ def calculate_partial_expected_mutual_information(n, x_i, y_j):
if x_i == 0 or y_j == 0:
return 0
coefficient = (-log2(x_i) - log2(y_j) + log2(n))
sum_probability = 0.0
partial_result = 0.0
for n_j, p_j in _hypergeometric_pmf(n, x_i, y_j):
if n_j != 0:
partial_result += n_j * (coefficient + log2(n_j)) * p_j
sum_probability += p_j
hyp_geo_pmf = _hypergeometric_pmf(n, x_i, y_j)
sum_probability = np.sum([p_j for n_j, p_j in hyp_geo_pmf])
partial_result = np.sum([n_j * (coefficient + log2(n_j)) * p_j for n_j, p_j in hyp_geo_pmf if n_j != 0])

# The values of p_j should sum to 1, but given approximate calculations for
# log2(x) and exp2(x) with large x, the full pmf might not sum to exactly 1.
# We correct for this by dividing by the sum of the probabilities.
Expand Down