Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more iteration than setting #65

Open
wt12318 opened this issue May 30, 2022 · 6 comments
Open

more iteration than setting #65

wt12318 opened this issue May 30, 2022 · 6 comments

Comments

@wt12318
Copy link

wt12318 commented May 30, 2022

Hi,

When I set the num_iteration is 50, the actual running iteration is more than 50:

config = dict()
config["optimizer"] = "Bayesian"
config["num_iteration"] = 50

tuner = Tuner(HYPERPARAMETERS, 
              objective=run_one_training,
              conf_dict=config) 
results = tuner.minimize()

The MLflow shows it has run 62 iterations:
image

@sandeep-iitr
Copy link
Collaborator

Hi,
Thanks for asking this question.

Internally, Mango will run a few random iterations to do a proper initialization
The number of these random iterations by default is 2.
You can modify this by the config parameter 'initial_random': 2
So, in most cases, your total iterations will be num_iteration + initial_random

However, this random parameter is a suggestion to the optimizer, and in some cases,
it may run more random iterations to do proper initialization. This happens for problems where the variation in the objective value is very little, and Mango may internally decide to more random iterations to make sure it finds good regions in the hyperparameter space. For most of the problems setting initial_random will make the iterations bounded as needed.

This may also happen in cases when some of the random iterations didn't succeed, and your objective function was able to consider their failures, due to which Mango ran more random iterations to make sure 2 random iterations succeeded.

@wt12318
Copy link
Author

wt12318 commented May 30, 2022

Thank you

@wt12318 wt12318 closed this as completed May 30, 2022
@wt12318 wt12318 reopened this Jun 9, 2022
@wt12318
Copy link
Author

wt12318 commented Jun 9, 2022

Hi,

When I set the initial_random is one, but it still run more iterations than I set. And the total number combination of my all parameter is 36, but it run more iterations than 36. Why this happened?

Thank you.

@sandeep-iitr
Copy link
Collaborator

Can you share more details about your parameter space and the definition of your objective function?

@wt12318
Copy link
Author

wt12318 commented Jun 10, 2022

Thank you for reply. This is my objective function and parameter space:

@scheduler.parallel(n_jobs=36)
def run_one_training(**params):
    with mlflow.start_run() as run:
        # Log parameters used in this experiment
        for key in params.keys():
            mlflow.log_param(key, params[key])

        # Loading the dataset
        print("Loading dataset...")
        train_dataset = TCRpMHCDataset(root="/public/slst/home/wutao2/TCR_neo/data/", filename="train_dt.csv",aaindex=aaindex, test=False, val=False)
        test_dataset = TCRpMHCDataset(root="/public/slst/home/wutao2/TCR_neo/data/", filename="val_dt.csv", aaindex=aaindex, test=False, val=True)

        # Prepare training
        train_loader = DataLoader(train_dataset, batch_size=params["batch_size"], shuffle=True)
        test_loader = DataLoader(test_dataset, batch_size=params["batch_size"], shuffle=True)

        # Loading the model
        print("Loading model...")
        model_params = {k: v for k, v in params.items() if k.startswith("model_")}
        model = GNN(feature_size=train_dataset[0].x.shape[1], model_params=model_params) 
        model = model.to(device)
        print(f"Number of parameters: {count_parameters(model)}")
        mlflow.log_param("num_params", count_parameters(model))

        # < 1 increases precision, > 1 recall
        loss_fn = torch.nn.BCEWithLogitsLoss()##
        optimizer = torch.optim.Adam(model.parameters(), 
                                    lr=params["learning_rate"],
                                    weight_decay=0)
        #scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=params["scheduler_gamma"])
        
        # Start training
        best_loss = 1000
        early_stopping_counter = 0
        for epoch in range(20): 
            if early_stopping_counter <= 5: # = x * 5 
                # Training
                model.train()
                loss = train_one_epoch(epoch, model, train_loader, optimizer, loss_fn)
                print(f"Epoch {epoch} | Train Loss {loss}")
                mlflow.log_metric(key="Train loss", value=float(loss), step=epoch)

                # Testing
                model.eval()
                if epoch % 1 == 0:
                    loss = test(epoch, model, test_loader, loss_fn)
                    print(f"Epoch {epoch} | Test Loss {loss}")
                    mlflow.log_metric(key="Test loss", value=float(loss), step=epoch)
                    
                    # Update best loss
                    if float(loss) < best_loss:
                        best_loss = loss
                        # Save the currently best model 
                        mlflow.pytorch.log_model(model, "model", signature=SIGNATURE)
                        
                        early_stopping_counter = 0
                    else:
                        early_stopping_counter += 1

            else:
                print("Early stopping due to no improvement.")
                return [best_loss]
    print(f"Finishing training with best test loss: {best_loss}")
    return [best_loss]

HYPERPARAMETERS = {
    "batch_size": [32,64,128],
    "learning_rate": [0.001,0.0001],
    "model_embedding_size": [32,64,128],
    "model_layers": [2,3],
    "model_dropout_rate": [0.5]
}

torch.set_num_threads(36)
torch.manual_seed(2022060801)
print("Running hyperparameter search...")
config = dict()
config["optimizer"] = "Bayesian"
config["num_iteration"] = 36
config["initial_random"] = 1

tuner = Tuner(HYPERPARAMETERS, 
              run_one_training,
              config) 
results = tuner.minimize()

image

@sandeep-iitr
Copy link
Collaborator

Hi,
Thanks for providing the details. I am a little busy due to an immediate deadline for the last few days.
I will work on reproducing this issue next week and will update you with a solution or more information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants