Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

solo.read_TNR()/SolO QLI: Function is slow, and speed scales badly with amount of data #140

Open
ErikPGJ opened this issue Apr 9, 2024 · 11 comments
Assignees

Comments

@ErikPGJ
Copy link
Member

ErikPGJ commented Apr 9, 2024

Step 1: Latest code?

  • [ X] This issue is present in the most recent code (i.e. after running git pull this is still a problem).
    Should be present in latest commit on devel branch: b679b71a Erik P G Johansson (2024-04-08 12:46:41 +0200)

Step 2: Describe your environment

  • irfu-matlab branch (e.g. master, devel): devel
  • Matlab version used (e.g. Matlab R2020b): MATLAB R2023b
  • Operating system (e.g. Windows 10, Mac OS X 10.15.6, Linux Ubuntu 20.04): Ubuntu 22.04.4 LTS
  • [X ] A 64-bit installation of Matlab was used.

Step 3: Describe the problem

solo.read_TNR() appears to have performance issues. Wall time spent in function increases faster than linearly with length of time interval of data used for it (the argument).

image

Steps to reproduce:

Simple code for visualizing behaviour.

%
% NOTE: Must mount /data/solo/ first. SolO DB fails without error if directories
% are not available.
%
% NOTE: May have to account for caching of (mounted/non-mounted) CDF files. SolO
% DB also has its own caching(?).
%
function read_TNR___speedTest
  tic
  close all

  solo.db_init('local_file_db', '/data/solo/');
  solo.db_init('db_cache_size_max', 4096)   % Unit: MiB.
  solo.db_cache('on', 'save')

%  N_SEC_ARRAY = 86400 * logspace(log10(0.1), log10(1), 5);
  N_SEC_ARRAY = 86400 * [0.2, 0.5, 1, 2, 3];

  for i = 1:numel(N_SEC_ARRAY)
    nSec         = N_SEC_ARRAY(i);
    tWallTimeSec = measure_read_TNR_walltime(nSec);
    
    fprintf('i            = %i\n', i)
    fprintf('nSec         = %g\n', nSec)
    fprintf('tWallTimeSec = %g\n', tWallTimeSec)

    nSecArray(i)         = nSec;
    tWallTimeSecArray(i) = tWallTimeSec;
  end

  plot(nSecArray/86400, tWallTimeSecArray, 'o-')
  xlabel('Length of time interval with TNR data [days]')
  ylabel('Wall time [s]')
  grid on

  toc
end



function tWallTimeSec = measure_read_TNR_walltime(nSec)
  START_TIME = irf.time_array('2022-12-28T00:00:00.00Z');
  STOP_TIME  = START_TIME + nSec;

  READ_TNR_TINT = [START_TIME; STOP_TIME];

  t = tic();
  TnrData = solo.read_TNR(READ_TNR_TINT);
  tWallTimeSec = toc(t);
end
@ErikPGJ
Copy link
Member Author

ErikPGJ commented Apr 9, 2024

Problem should exist in below loop:

for ind_sweep = min_sweep:max_sweep
  v1_ = zeros(128, 1);
  p_punt = find(sweep_num == ind_sweep);
  if ~isempty(p_punt)
    for indband = 1:numel(p_punt)
      idx_l = floor(32 * bande_e(p_punt(indband))) + 1;
      idx_r = floor(32 * (bande_e(p_punt(indband)) + 1));
      v1_(idx_l:idx_r) = auto_calib(p_punt(indband), :);
    end
  end

  if sum(v1_) > 0.0    % Might be a way of detecting CDF fill values (large, negative).
    punt0_ = find(v1_ == 0.0);
    if ~isempty(punt0_)
      v1_(punt0_) = NaN;
    end
    v_ = [v_, v1_];
    sweepn_tnr = [sweep_tnr, sweep_num(p_punt(1))];
  end

  if ~isempty(p_punt)
    time_ = [time_, time_rr(min(p_punt))];
  end
end

In particular, the incrementally growing arrays sweepn_tnr and time_ look suspicious.

@ErikPGJ
Copy link
Member Author

ErikPGJ commented Apr 9, 2024

Previous informal test of above loop:

1 day    13 s
2 days   45 s
3 days  121 s

Or maybe it does not scale badly after all?

image

@ErikPGJ
Copy link
Member Author

ErikPGJ commented Apr 9, 2024

Note: Underlying dataset used for testing have very similar sizes.

/data/solo/remote/data/L2/thr> ll -h 2022/12/solo_L2_rpw-tnr-surv-cdag_202212[23]*  2023/01/solo_L2_rpw-tnr-surv-cdag_2023010* | cut -b31-
 326M 2023-11-13 14.57:52 2022/12/solo_L2_rpw-tnr-surv-cdag_20221220_V07.cdf
 305M 2023-11-13 15.02:40 2022/12/solo_L2_rpw-tnr-surv-cdag_20221221_V10.cdf
 328M 2023-11-13 15.07:52 2022/12/solo_L2_rpw-tnr-surv-cdag_20221222_V06.cdf
 288M 2023-11-13 15.12:23 2022/12/solo_L2_rpw-tnr-surv-cdag_20221223_V09.cdf
 328M 2023-11-13 15.17:32 2022/12/solo_L2_rpw-tnr-surv-cdag_20221224_V07.cdf
 328M 2023-11-13 15.22:42 2022/12/solo_L2_rpw-tnr-surv-cdag_20221225_V06.cdf
 327M 2023-11-13 15.27:54 2022/12/solo_L2_rpw-tnr-surv-cdag_20221226_V08.cdf
 326M 2023-11-13 15.32:59 2022/12/solo_L2_rpw-tnr-surv-cdag_20221227_V07.cdf
 327M 2023-11-13 15.38:15 2022/12/solo_L2_rpw-tnr-surv-cdag_20221228_V08.cdf
 328M 2023-11-13 15.43:34 2022/12/solo_L2_rpw-tnr-surv-cdag_20221229_V07.cdf
 328M 2023-11-13 15.48:49 2022/12/solo_L2_rpw-tnr-surv-cdag_20221230_V09.cdf
 328M 2023-11-13 15.53:58 2022/12/solo_L2_rpw-tnr-surv-cdag_20221231_V09.cdf
 328M 2023-11-13 11.44:33 2023/01/solo_L2_rpw-tnr-surv-cdag_20230101_V08.cdf
 327M 2023-11-13 11.49:48 2023/01/solo_L2_rpw-tnr-surv-cdag_20230102_V08.cdf
 326M 2023-11-13 11.54:59 2023/01/solo_L2_rpw-tnr-surv-cdag_20230103_V08.cdf
 220M 2023-11-13 11.58:33 2023/01/solo_L2_rpw-tnr-surv-cdag_20230104_V07.cdf
 328M 2023-11-13 12.03:49 2023/01/solo_L2_rpw-tnr-surv-cdag_20230105_V06.cdf
 328M 2023-11-13 12.09:02 2023/01/solo_L2_rpw-tnr-surv-cdag_20230106_V06.cdf
 328M 2023-11-13 12.14:22 2023/01/solo_L2_rpw-tnr-surv-cdag_20230107_V05.cdf
 328M 2023-11-13 12.19:36 2023/01/solo_L2_rpw-tnr-surv-cdag_20230108_V05.cdf
 277M 2023-11-13 12.24:08 2023/01/solo_L2_rpw-tnr-surv-cdag_20230109_V06.cdf

@ErikPGJ
Copy link
Member Author

ErikPGJ commented Apr 9, 2024

Profiling implies that the problem is something completely different:

profile on; solo.read_TNR(irf.tint('2022-12-28T00:00:00Z/2022-12-29T00:00:00Z')); profile off; profile viewer
image

@ErikPGJ
Copy link
Member Author

ErikPGJ commented Apr 9, 2024

Indeed, the command
p_punt = find(sweep_num == ind_sweep);
does seem to take most of the time, even if commenting out everything else in the loop.

@ErikPGJ
Copy link
Member Author

ErikPGJ commented Apr 9, 2024

From analyzing the code, it seems possible to speed up the code somewhat (factor of ~2?) by making below replacement:

%for ind_sweep = min_sweep:max_sweep
for ind_sweep = unique(sweep_num)'   % NOTE: "sweep_num = sweep_num(sens_)" above.

image

(Not tested if the resulting data is the same though.)

@ErikPGJ
Copy link
Member Author

ErikPGJ commented Apr 10, 2024

Speculative idea: Could maybe parallelize the loop (parfor)?

@ErikPGJ
Copy link
Member Author

ErikPGJ commented Apr 10, 2024

Seems one could simplify (not speed up)

punt0_ = find(v1_ == 0.0);
if ~isempty(punt0_)
  v1_(punt0_) = NaN;
end

to

v_1(v1_==0) = NaN;

.

@ErikPGJ
Copy link
Member Author

ErikPGJ commented Apr 10, 2024

Seems one could almost simplify iteration (not speed up) to use only one if statement since
isempty(p_punt) ==> ~(sum(v1_) > 0.0).

Footnote: Not implementing these ideas myself since (1) I do not want to interfere with Jordi's work on read_TNR(), and (2) because I do not have time to actually check if it works well myself right now. Just collecting ideas.

@JordiBoldu
Copy link
Contributor

Changing

punt0_ = find(v1_ == 0.0);
if ~isempty(punt0_)
v1_(punt0_) = NaN;
end

to

v_1(v1_==0) = NaN;

should work. The code was translated 'literally' from IDL so there are some implementations that are not optimized for matlab.

@ErikPGJ
Copy link
Member Author

ErikPGJ commented Apr 10, 2024

Yes, now when I looked more carefully for it, that sounds it could be a "bad translation".

I did not really intend to analyze read_TNR() for bugs (since that is more for #138), just tried to understand what the loop does in my head. I also note that sweep_tnr and sweepn_tnr are not used for anything. Is it both a typo (should be one and same variable?) and variable(s) that should have been removed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants