Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running BlackBoxOptim on a Cluster #84

Open
pedm opened this issue Jun 19, 2018 · 3 comments
Open

Running BlackBoxOptim on a Cluster #84

pedm opened this issue Jun 19, 2018 · 3 comments

Comments

@pedm
Copy link

pedm commented Jun 19, 2018

Hello,

I am working with BlackBoxOptim on a slurm cluster, using ClusterManager to manage the additional processes. Unfortunately BlackBoxOptim does not run the distance function on the additional nodes.

In this situation, the distance function is only run on the processes on the same node as the master. Do you know if there's any way to fix this? A small sample code is below, containing everything but the distance function. Thank you!


using ClusterManagers
nnodes = 3
ncores = 16
np = nnodes * ncores
@time addprocs(SlurmManager(np), t="00:30:00")

@everywhere using BlackBoxOptim
opt2 = bbsetup(distance_fcn; Method=:dxnes, SearchRange = collect(values(EST_variables)),
              NumDimensions = 2, MaxFuncEvals = 3000, Workers = workers())
@robertfeldt
Copy link
Owner

Hmm, unfortunately I have very little experience running this on ClusterManagers but I think you might need to use prelim code in one of our PRs to get this working. Maybe @alyst better knows the current status? We sure want to get this up and running as soon as we have made the move to 0.7/1.0 so would be great to have more detailed test cases and feedback from you at that points @pedm

@robertfeldt
Copy link
Owner

Ah, sorry for being terse. PR = Pull Requests, we have two different ones related to parallel evaluation during optimization, see:

#46

and

#25

The former (46) is closer to a final stage but still needs some work. We probably will prioritize getting a stable version up for 0.7/1.0 as soon as they are out but @alyst knows more about the status of these PRs.

@JulienPascal
Copy link

Hi,

I also finds that BlackBoxOptim does not use all available workers when used on a cluster with ClusterManagers.
With the example below, only workers 1 - 14 are used. Any idea why it is the case? Thank you.

using Distributed
using ClusterManagers

addWorkers = true
OnCluster = true

n_nodes = 2
n_cores_per_node = 10
maxNumberWorkers = round(Int, n_nodes*n_cores_per_node)

if addWorkers == true
	if OnCluster == true && n_nodes > 1
		print("Multiple nodes: using SlurmManager")
  		addprocs(SlurmManager(maxNumberWorkers))
	else
		print("Single node")
	  	addprocs(maxNumberWorkers)
	end
end

@everywhere using Distributed

# Check the way workers are spread on nodes
# (relevant if on a cluster)
#------------------------------------------
hosts = []
pids = []
for i in workers()
	host, pid = fetch(@spawnat i (gethostname(), getpid()))
	println("Hello I am worker $(i), my host is $(host)")
	push!(hosts, host)
	push!(pids, pid)
end

# check the number of workers:
#----------------------------
currentWorkers = nworkers()
println("Number of workers = $(currentWorkers)")


@everywhere using BlackBoxOptim
@everywhere function slow_rosenbrock(x)
  sleep(0.001) # Fake a slower func to be optimized...
  println("I am worker $(myid())")
  println("I am worker $(gethostname())")
  return BlackBoxOptim.rosenbrock(x)
end
opt = bboptimize(slow_rosenbrock, Method=:dxnes, SearchRange = (-5.0, 5.0),
              NumDimensions = 50, MaxFuncEvals = 100000, Workers = workers())

res = best_candidate(opt)
print("Minimizer: $(res)")

print("Best fitness: $(best_fitness(opt))")
julia> versioninfo()
Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, nehalem)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants