The domain adaptation phase seems not reproducible #7

Howard-Hsiao · 2024-01-31T04:13:05Z

Hello,
I recently tried your program, but during my experimentation, I found that the results of the program did not meet my expectations. Following the hyperparameter configuration you provided for RHD->H3D_crop protocol, I was only able to achieve a best score of 0.689 in the H3D_crop, target domain. Because it appears that it might not be functioning as expected. I was wondering if there are any hyperparameter configurations that may have been overlooked in the README.md, and if you could provide some guidance on how to address this issue. I appreciate your assistance in advance.

Initially, there were issues with the train_sfda.py script. I made some code revisions, and I will discuss these changes in the following sections.

1. restructure the optmizer.step part

       with torch.cuda.amp.autocast():

            y_t_sr = model_sr(x_t_stu)
            y_t_in = model_in(x_t_stu)

            loss_ft = criterion(y_t_sr, y_t_in)

            loss_res = res_criterion(y_t_sr, y_t_in)
        
            loss_in = loss_ft + 0.7 * loss_res

        scaler_sr.scale(loss_ft).backward()
        scaler_sr.step(sr_optimizer)
        scaler_sr.update()

        scaler_in.scale(loss_in).backward()
        scaler_in.step(in_optimizer)
        scaler_in.update()

to

        with torch.cuda.amp.autocast():
            y_t_sr = model_sr(x_t_stu)
            with torch.no_grad():
                y_t_in = model_in(x_t_stu)
            loss_ft = criterion(y_t_sr, y_t_in)

        scaler_sr.scale(loss_ft).backward()
        scaler_sr.step(sr_optimizer)
        scaler_sr.update()

        with torch.cuda.amp.autocast():
            with torch.no_grad():
                y_t_sr = model_sr(x_t_stu)
            y_t_in = model_in(x_t_stu)

            loss_ft = criterion(y_t_sr, y_t_in)
            loss_res = res_criterion(y_t_sr, y_t_in)
            loss_in = loss_ft + 0.7 * loss_res

        loss_in.backward()
        in_optimizer.step()

If I don't adjust the code, I would encounter a series of errors, such as:

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

after I change the code to scaler_sr.scale(loss_ft).backward(retain_graph=True)

Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256]] is at version 37; expected version 36 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).```

2. im_criterion

I modified the code to use loss_im = im_criterion(y_t_tg, -1) instead of loss_im = im_criterion(y_t_in, y_t_tg) in an attempt to align with the paper.

3. CST_Loss

Modify the code in order to make the loss a scaler so that I can use loss.backward().

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The domain adaptation phase seems not reproducible #7

The domain adaptation phase seems not reproducible #7

Howard-Hsiao commented Jan 31, 2024

The domain adaptation phase seems not reproducible #7

The domain adaptation phase seems not reproducible #7

Comments

Howard-Hsiao commented Jan 31, 2024

1. restructure the optmizer.step part

2. im_criterion

3. CST_Loss