Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] tau_2 is dependent on tau_1 #15

Open
alabamagan opened this issue Oct 24, 2021 · 2 comments
Open

[Question] tau_2 is dependent on tau_1 #15

alabamagan opened this issue Oct 24, 2021 · 2 comments

Comments

@alabamagan
Copy link

alabamagan commented Oct 24, 2021

I am not sure if this is the intended behavior but it seems like tau_2 is dependent on tau_1, i.e., the rate of feature being selected, judging from the article on arxiv and also the code. It seems that this would prioritize tau_1 over tau_2 and tau_2 will be scaled according to the rate of the feature being selected during the feature selection.

I conducted a simulation of tau_2 of one feature x at different rate the feature being selected, and I plot positive rate of x against tau_2 (right):

img
It clearly shows that the tau_2 is scaled significantly with the frequency of the feature being selected by the model. If the features is not selected 100% of in all model, the range of tau_2 is not (0 to 1).
Seeing this might be problematic, I added a scaling factor w.r.t to the observed zero-rate (left). (rate of feature not selected across K runs), and this return the range back to (0 to 1).

I just want to ask if this is the intended behavior and how could this affect the results of feature selection?

@olivertomic
Copy link
Collaborator

olivertomic commented Oct 25, 2021

Hi alabamagan. Thank you for investigating this, this looks interesting. The way tau_2 is designed it can never exceed tau_1 (meaning that you are right, tau_2 is dependent on tau_1), only take values equal to or lower than tau_1. This means that if tau_1 takes value 1.0, then it is fully possible that tau_2 takes the value 1.0. In this particular case a feature has been selected across all K models (tau_1 = 1.0) and all weights have the same sign, either positive or negative (which results in tau_2 = 1.0). If, say, tau_1 = 0.7 then tau_2 cannot take a value higher than 0.7, since in 70% of the models the weights are non-zero and therefore must have a sign being either positive or negative.

The reason why tau_2 was introduced is that we have seen for some datasets that weights of a feature were often non-zero giving a first impression that this feature is important, because it is selected often. But then we discovered that the weights of that feature had both positive and negative signs (in addition to being relatively small). If you want to select a feature only if all non-zero weights have the same sign, then you need to set tau_2 to the same value as tau_1. If you want to be less strict regarding this you may set tau_2 to a lower value.

In our article at arxiv we describe that all three criteria tau_1, tau_2 and tau_3 must be fulfilled to select a feature. In general, the user has the freedom to choose which of the three tau_1, tau_2 and tau_3 should be used to identify and select features. If the user doesn't care about tau_2 and tau_3 the user can set them to zero, meaning that they are practically eliminated as a criteria.

But just to be sure I understand your plots correctly:

  1. When you say "positive rate of x", do you mean by this the percentage of the weights of x having a positive sign across K models?
  2. When you use the term "Zero-rate", do you refer to the percentage of weights of x being non-zero across K models?
  3. Does the term "Score" in the plot represent the value for tau_2?

@alabamagan
Copy link
Author

alabamagan commented Oct 25, 2021

But just to be sure I understand your plots correctly:

1. When you say "positive rate of x", do you mean by this the percentage of the weights of x having a positive sign across K models?

2. When you use the term "Zero-rate", do you refer to the percentage of weights of x being non-zero across K models?

3. Does the term "Score" in the plot represent the value for tau_2?

Hi, thanks for your answer. You guesses are correct for all three points you raised.
Basically I simulate the feature selection by setting probability of sampling three values [-1, 0, 1] 1000times with the following probability:
p(0) = zero-rate
p(+1) = (1-p(0)) * pos_rate
p(-1) = (1-p(0)) * (1-pos_rate)
I tunned zero-rate and pos_rate and repeat the simulation, computing tau_2 at each point of the grid to get the graph on the right.
What's interesting is that this scaling seems to follow a log relation instead of a linear one such that the delta-magnitude of the penalty between 0.1 and 0.2 zero-rate is far greater than that between 0.5 and 0.6 zero rates.

Its quite solid to add tau_2 to the criteria, but my slight worry is the complicated effect of coupling tau_1 and tau_2 could shift in a way that tau_1 become effectively the dominated criteria. This could make tunning threshold for tau_2 very difficult. I will probably do more test to see if this is indeed true.

Edit1: I made some mistake with the code and I have correct it and the relation of tau_1 with tau_2 is linear instead of exponential (so what I speculated was wrong). But I think my worry is still valid as this complicates the threshold grid search for tau_2 that it might be better to scale tau_2 with the sum(onehot(beta != 0)) instead of K. Updated figure here:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants