This feature is available only to customers on the Workgroup plan. Visit our pricing page for more information.

Run SigOpt in Parallel

Running an experiment on several machines at once is easy and natural with the SigOpt API.

Before you start running experiments in parallel, make sure you know how to Create an Experiment, and that you feel comfortable with the basics of the Optimization Loop.

Create an Experiment

Create your experiment on a master machine. You only need to perform this step once. Make a note of your experiment's id because you'll need it in the next step.

If you're logged in, you can also track your experiment's progress on your Experiment Dashboard.

Initialize the Workers

Initialize each of your workers them with the EXPERIMENT_ID from the experiment that you just created.

All workers, whether individual threads or machines, will receive the same experiment id.

Run the Optimization Loop in Parallel

Now, start the optimization loop on each worker machine. Workers will individually communicate with SigOpt's API, creating Suggestions, evaluating your metric, and then creating Observations.

Why This Works

A large benefit of SigOpt's parallelization is that each worker communicates individually with the SigOpt API, so you do not need to worry about task managment.

SigOpt acts as a distributed scheduler for your Suggestions, ensuring that each worker machine receives the best possible Suggestion at the moment it creates a new Suggestion. SigOpt tracks which Suggestions are currently open, so machines independently creating Suggestions will not receive duplicates.

Using Metadata

Metadata is user-provided key/value pairs that SigOpt stores on your behalf under the metadata field. Metadata on Observations can be inspected using both the API and the web interface, making it ideally suited for tracking information about your distributed system.

As a starting point, we recommend tracking a unique tag for each machine in the metadata of Observations. As your distributed job is running, you can view which machines have most recently reported Observations on the experiment's web dashboard.

Show Me the Code

These code snippets provide an example combine suggested master/worker division of labor, as well as incorporating metadata to track which machines have reported Observations.

Master: Create Experiment, Spin up Workers

from sigopt import Connection

def master(api_token, num_workers=1):
    # Create the SigOpt connection
    conn = Connection(client_token=api_token)

    # Create the experiment on master
    experiment = conn.experiments().create(
        name="Classifier Accuracy",
        parameters=[
            {
                'bounds': {
                  'max': 1.0,
                  'min': 0.001
                },
                'name': 'gamma',
                'type': 'double'
            }
        ],
    )

    for _ in range(num_workers):
        # Launch a worker and run the run_worker
        # function (below) on the worker machine
        # You implement this function
        spin_up_worker(
            api_token=api_token,
            experiment_id=experiment.id,
        )

Worker: Run Optimization Loop with Metadata

import socket
from sigopt import Connection

# Each worker runs the same optimization loop
# for the experiment created on master
def run_worker(api_token, experiment_id):
    # Create the SigOpt connection
    conn = Connection(client_token=args.api_token)

    # Keep track of the hostname for logging purposes
    hostname = socket.gethostname()

    for _ in range(40):
        # Receive a Suggestion
        suggestion = conn.experiments(experiment_id).suggestions().create()

        # Evaluate Your Metric
        # You implement this function
        value = evaluate_metric(suggestion.assignments)

        # Report an Observation
        # Include the hostname so that you can track
        # progress on the web interface
        conn.experiments(experiment_id).observations().create(
            suggestion=suggestion.id,
            value=value,
            metadata=dict(hostname=hostname),
        )

Recovering From Machine Failure

Recovering Open Suggestions

In the event that one or more of your machines fail, you may have a Suggestion or two in an open state. You can list open Suggestions and continue to work on them:

suggestions = conn.experiments(experiment_id).suggestions().fetch(state="open")
for suggestion in suggestions.iterate_pages():
    value = evaluate_metric(suggestion.assignments)  # implement this
    conn.experiments(experiment_id).observations().create(
        suggestion=suggestion.id,
        value=value,
    )

Or you can simply delete open Suggestions:

conn.experiments(experiment_id).suggestions().delete(state="open")