Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Linear regression model predicts NaN values only #3210

Open
wrigleyDan opened this issue Nov 11, 2024 · 7 comments · May be fixed by #3291
Open

[BUG] Linear regression model predicts NaN values only #3210

wrigleyDan opened this issue Nov 11, 2024 · 7 comments · May be fixed by #3291
Assignees
Labels
bug Something isn't working

Comments

@wrigleyDan
Copy link

What is the bug?
I trained a linear regression model with 5000 features and apparently when calling the _predict API only NaN values are returned.

I cannot exclude that I'm using parameters that are not ideal and as a consequence lead to the NaN predictions. I unsuccessfully tried smaller learning rates but did not experiment with all available parameters and parameter values.

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Get the features at https://gist.github.com/wrigleyDan/a83a5d8294aa0ed493e4feb8cc9d7433
  2. Get the notebook to see how I ingest the data, train a model, predict a value: https://gist.github.com/wrigleyDan/16deb9cd8201ec502acda036c0b150b5
  3. Run the notebook with the feature data
  4. See NaN as the predicted value

What is the expected behavior?
The expected behavior is to receive not only NaN values but reasonable predictions, in the given example values between 0 and 1.

What is your host/environment?

  • OpenSearch v 2.16.0

Do you have any screenshots?
See the linked Gist with a notebook example and the data used as features.

Do you have any additional context?
Initially reported in the #ml OpenSearch Slack channel: https://opensearch.slack.com/archives/C05BGJ1N264/p1731077205560749

@b4sjoo
Copy link
Collaborator

b4sjoo commented Nov 19, 2024

Taking a look

@wrigleyDan
Copy link
Author

Any news on this one @b4sjoo?

@dhrubo-os
Copy link
Collaborator

@b4sjoo any update on this?

@rithin-pullela-aws
Copy link
Contributor

@dhrubo-os can you please assign this issue to me?

@dhrubo-os
Copy link
Collaborator

@rithin-pullela-aws I just assigned to you. Thanks for looking into this.

@rithin-pullela-aws
Copy link
Contributor

Hi @wrigleyDan, experimenting with different optimiser and learning rates results in better model weights and responses.

I used ADA_GRAD and got the output between 0 and 1:

url = "http://localhost:9200/_plugins/_ml/_train/linear_regression"

payload = {
    "parameters": {
      "target": "neuralness",
      "learningRate": 0.01,
      "optimiser": "ADA_GRAD"
    },
    "input_query": {
        "_source": ["neuralness", "f_1_num_of_terms", "f_2_query_length", "f_3_has_numbers", "f_4_has_special_char", "f_5_num_results",
                    "f_6_max_title_score", "f_7_sum_title_scores", "f_8_max_semantic_score", "f_9_avg_semantic_score"],
        "size": 10000
    },
    "input_index": [
        "features"
    ]
}

response = requests.request("POST", url, headers=headers, data=json.dumps(payload))
print(response.json())
linear_model_id = response.json()['model_id']
print(f"Created model {linear_model_id}")

Open Search uses Tribuo to perform linear regression, please find this bug report on Tribuo for better explanation.

@Craigacp
Copy link

For Tribuo's linear regressions, it's probably better to default to using AdaGrad with some reasonable learning rate rather than a constant learning rate SGD as it's very tricky to tune that correctly. We provide a default LogisticRegressionTrainer which uses AdaGrad and other default parameter choices, but we didn't provide one for linear regression (mostly because that didn't appear in our demos as much as logistic regression did).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

5 participants