GPU-accelerated run of 2024 Example fails #2

Richert · 2025-02-04T23:12:11Z

Hi!

I am currently trying to run the following example jupyter notebook: https://github.com/engellab/neuralflow/blob/master/examples/2024_the_dynamics_and_geometry/Example2.ipynb.

Due to the time it takes to run the optimization, I decided to run it on my GPU. However, I am running into a neuralflow-specific error, with the following traceback:

Example2
Last Checkpoint: 14 minutes ago
[Python 3 (ipykernel)]
Step 3: Perform optimization.

NOTE: population optimization may take a lot of time. To accelerate execution of the next cell, reduce "max_epoch" and/or make "epoch_schedule" in C_opt and D_opt empty
# The optimization was performed in a grid with Np = 8, Ne = 64. Here we set Ne to 16 to reduce fitting time
grid = neuralflow.GLLgrid(Np = 8, Ne = 16, with_cuda = use_gpu)

# Initial guess
init_model = neuralflow.model.new_model(
    peq_model = {"model": "uniform", "params": {}},
    p0_model = {"model": "cos_square", "params": {}},
    D = 1,
    fr_model = [{"model": "linear", "params": {"slope": 1, "bias": 100}}] * 14,
    params_size={'peq': 4, 'D': 1, 'fr': 1, 'p0': 1},
    grid = grid,
    with_cuda = use_gpu
)

optimizer = 'ADAM'

# In the paper we set max_epoch = 5000, mini_batch_number = 20, and did 30 line searches logarithmically scattered across 5000 epochs.
# Here we change these parameters to reduce optimization time
opt_params = {'max_epochs': 50, 'mini_batch_number': 20, 'params_to_opt': ['F', 'F0', 'D', 'Fr', 'C'], 'learning_rate': {'alpha': 0.05}}
ls_options = {'C_opt': {'epoch_schedule': [0, 1, 5, 30], 'nSearchPerEpoch': 3, 'max_fun_eval': 2}, 'D_opt': {'epoch_schedule': [0, 1, 5, 30], 'nSearchPerEpoch': 3, 'max_fun_eval': 25}}
boundary_mode = 'absorbing'

# Train on datasample 1
dataTR = [v for v in datasample1.values()]
optimization1 = neuralflow.optimization.Optimization(
                    dataTR,
                    init_model,
                    optimizer,
                    opt_params,
                    ls_options,
                    boundary_mode=boundary_mode,
                    device=device
                )

# run optimization
print('Running optimization on datasample 1')
optimization1.run_optimization()

# Train on datasample 2
dataTR = [v for v in datasample2.values()]
optimization2 = neuralflow.optimization.Optimization(
                    dataTR,
                    init_model,
                    optimizer,
                    opt_params,
                    ls_options,
                    boundary_mode=boundary_mode,
                    device=device
                )

# run optimization
print('Running optimization on datasample 2')
optimization2.run_optimization()

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[7], line 25
     23 # Train on datasample 1
     24 dataTR = [v for v in datasample1.values()]
---> 25 optimization1 = neuralflow.optimization.Optimization(
     26                     dataTR,
     27                     init_model,
     28                     optimizer,
     29                     opt_params,
     30                     ls_options,
     31                     boundary_mode=boundary_mode,
     32                     device=device
     33                 )
     35 # run optimization
     36 print('Running optimization on datasample 1')

File ~/miniforge3/envs/langevin/lib/python3.10/site-packages/neuralflow/optimization.py:88, in Optimization.__init__(self, dataTR, init_model, optimizer_name, opt_options, line_search_options, pde_solve_params, boundary_mode, save_options, dataCV, device)
     86 self.optimizer_name = optimizer_name
     87 # Optimizer object - see base_optimization class
---> 88 self.optimizer = opt_fun.initialize(
     89     dataTR, init_model, opt_options, line_search_options,
     90     pde_solve_params, boundary_mode,
     91     save_options, dataCV, device
     92 )
     93 # Initialize ruinning average, RMS prop, and epoch counter needed for
     94 # ADAM algorithm.
     95 if optimizer_name == 'ADAM':

File ~/miniforge3/envs/langevin/lib/python3.10/site-packages/neuralflow/base_optimizer.py:448, in adam_opt.initialize(cls, dataTR, init_model, opt_options, line_search_options, pde_solve_params, boundary_mode, save_options, dataCV, device)
    446 # Check learning rate parameters for Adam:
    447 adam_opt._check_optimization_options(opt_options)
--> 448 return cls(
    449     dataTR, init_model, opt_options, line_search_options,
    450     pde_solve_params, boundary_mode, save_options, dataCV, device
    451 )

File ~/miniforge3/envs/langevin/lib/python3.10/site-packages/neuralflow/base_optimizer.py:108, in optimizer.__init__(self, dataTR, init_model, opt_options, line_search_options, pde_solve_params, boundary_mode, save_options, dataCV, device)
    102 self.shared_params = [
    103     k for k in opt_options['params_to_opt']
    104     if self.model.params_size[self.opt_model_map[k]] == 1
    105 ]
    107 # Number of trials in each datasample
--> 108 self.num_trial = [
    109     len(self.get_dataTR(samp)) for samp in range(self.num_datasamples)
    110 ]
    112 # Size of minibatch for each sample
    113 self.mini_batch_size = [
    114     math.ceil(tr_size / opt_options['mini_batch_number'])
    115     for tr_size in self.num_trial
    116 ]

File ~/miniforge3/envs/langevin/lib/python3.10/site-packages/neuralflow/base_optimizer.py:109, in <listcomp>(.0)
    102 self.shared_params = [
    103     k for k in opt_options['params_to_opt']
    104     if self.model.params_size[self.opt_model_map[k]] == 1
    105 ]
    107 # Number of trials in each datasample
    108 self.num_trial = [
--> 109     len(self.get_dataTR(samp)) for samp in range(self.num_datasamples)
    110 ]
    112 # Size of minibatch for each sample
    113 self.mini_batch_size = [
    114     math.ceil(tr_size / opt_options['mini_batch_number'])
    115     for tr_size in self.num_trial
    116 ]

File ~/miniforge3/envs/langevin/lib/python3.10/site-packages/neuralflow/base_optimizer.py:156, in optimizer.get_dataTR(self, nsample)
    154 if self.device == 'CPU':
    155     return self.dataTR[nsample].data
--> 156 return self.dataTR[nsample].cuda_var.data

AttributeError: 'var' object has no attribute 'data'

FYI - I added two variables device and use_gpu to the cell for Step 2:

# We analyze the data starting from 120 ms from stimulus onset. This accounts for the delay between stimulus onset
# and the emergence of decision-making dynamics in PMd.
time_offset = 0.12
device = "CPU"
use_gpu = True if device == "GPU" else False
datasample1, datasample2 = {}, {}
for stim_difficulty in ['hard', 'easy']:
    for chosen_side in ['Left', 'Right']:
        
        # Filter data for the current condition
        data_cur = data[(data.chosen_side == chosen_side) & (data.stim_difficulty == stim_difficulty)].reset_index()
        
        # Align to stimulus onset and subtract time offset. Set t=0 to 120 ms from stimulus onset, and stop at RT
        for u in range(num_neurons):
            data_cur.loc[:, f'neuron_{u}'] = data_cur[f'neuron_{u}'] - data_cur['stim_onset'] - time_offset
            data_cur.loc[:, f'neuron_{u}'] =  data_cur.apply(lambda x: x[f'neuron_{u}'][(x[f'neuron_{u}'] >= 0) & (x[f'neuron_{u}'] <= x.RT - time_offset)], axis = 1)
        
        # Collect spikes into 2D array for each trial
        data_cur = data_cur.assign(spikes=data_cur[[f'neuron_{i}' for i in range(num_neurons)]].values.tolist())
        
        # time epoch
        data_cur['time_epoch'] = data_cur.RT.apply(lambda x: (0, x - time_offset))
        
        # Assign even trials to datasample 1, and odd trials to datasample 2
        num_trials = data_cur.shape[0]
        
        ind1 = np.arange(0, num_trials, 2)
        datasample1[f'{chosen_side}_{stim_difficulty}'] = neuralflow.SpikeData(
            data = np.array(data_cur.loc[ind1,'spikes'].to_list(), dtype=np.ndarray).T, dformat = 'spiketimes', time_epoch = data_cur.loc[ind1, 'time_epoch'].to_list(), with_cuda = use_gpu
        )
        ind2 = np.arange(1, num_trials, 2)
        datasample2[f'{chosen_side}_{stim_difficulty}'] = neuralflow.SpikeData(
            data = np.array(data_cur.loc[ind2,'spikes'].to_list(), dtype=np.ndarray).T, dformat = 'spiketimes', time_epoch = data_cur.loc[ind2, 'time_epoch'].to_list(), with_cuda = use_gpu
        )
        
        # Convert to ISI format
        datasample1[f'{chosen_side}_{stim_difficulty}'].change_format('ISIs')
        datasample2[f'{chosen_side}_{stim_difficulty}'].change_format('ISIs')

and used them also for Step 3:

# The optimization was performed in a grid with Np = 8, Ne = 64. Here we set Ne to 16 to reduce fitting time
grid = neuralflow.GLLgrid(Np = 8, Ne = 16, with_cuda = use_gpu)

# Initial guess
init_model = neuralflow.model.new_model(
    peq_model = {"model": "uniform", "params": {}},
    p0_model = {"model": "cos_square", "params": {}},
    D = 1,
    fr_model = [{"model": "linear", "params": {"slope": 1, "bias": 100}}] * 14,
    params_size={'peq': 4, 'D': 1, 'fr': 1, 'p0': 1},
    grid = grid,
    with_cuda = use_gpu
)

optimizer = 'ADAM'

# In the paper we set max_epoch = 5000, mini_batch_number = 20, and did 30 line searches logarithmically scattered across 5000 epochs.
# Here we change these parameters to reduce optimization time
opt_params = {'max_epochs': 50, 'mini_batch_number': 20, 'params_to_opt': ['F', 'F0', 'D', 'Fr', 'C'], 'learning_rate': {'alpha': 0.05}}
ls_options = {'C_opt': {'epoch_schedule': [0, 1, 5, 30], 'nSearchPerEpoch': 3, 'max_fun_eval': 2}, 'D_opt': {'epoch_schedule': [0, 1, 5, 30], 'nSearchPerEpoch': 3, 'max_fun_eval': 25}}
boundary_mode = 'absorbing'

# Train on datasample 1
dataTR = [v for v in datasample1.values()]
optimization1 = neuralflow.optimization.Optimization(
                    dataTR,
                    init_model,
                    optimizer,
                    opt_params,
                    ls_options,
                    boundary_mode=boundary_mode,
                    device=device
                )

# run optimization
print('Running optimization on datasample 1')
optimization1.run_optimization()

# Train on datasample 2
dataTR = [v for v in datasample2.values()]
optimization2 = neuralflow.optimization.Optimization(
                    dataTR,
                    init_model,
                    optimizer,
                    opt_params,
                    ls_options,
                    boundary_mode=boundary_mode,
                    device=device
                )

# run optimization
print('Running optimization on datasample 2')
optimization2.run_optimization()

Other than that, I did not change anything. If I run the optimization with device="CPU", it starts without any errors.
If you have an idea what might be going wrong, I would appreciate the help!

Best, Richard

The text was updated successfully, but these errors were encountered:

MikGen · 2025-02-09T04:36:45Z

Hello, thank you for your question and your interest in our framework!
From the traceback, it seems like the training data was not transferred to GPU. As per https://github.com/engellab/neuralflow/blob/master/tests/test_optimization.py#L74, after creating the training data (and validation data, if you use it), you need to run dataTR.to_GPU(). This method will create a copy of the data in GPU device.

In summary, to run the code on GPU one needs to do the following:

Create an initial guess model passing with_cuda=True. If the grid class is used for the creation of the initial model, grid should be also created with with_cuda=True option.
Transfer training/validation spikedata to GPU memory by executing dataTR.to_GPU().
Pass device="GPU" to optimization class upon initialization.

I will consider improving the code by automating this step.

Thank you,
Mikhail

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU-accelerated run of 2024 Example fails #2

GPU-accelerated run of 2024 Example fails #2

Richert commented Feb 4, 2025

MikGen commented Feb 9, 2025

GPU-accelerated run of 2024 Example fails #2

GPU-accelerated run of 2024 Example fails #2

Comments

Richert commented Feb 4, 2025

MikGen commented Feb 9, 2025