Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with making large dat file as INPUT #17

Open
Jacqueline470 opened this issue May 29, 2024 · 3 comments
Open

Problem with making large dat file as INPUT #17

Jacqueline470 opened this issue May 29, 2024 · 3 comments

Comments

@Jacqueline470
Copy link

Hi, I am trying to make a 30GB dat file as an input of DINEOF, and I run the code in Matlab using gwrite.m.
The array seems to be alright in a numerical matrix in MATLAB, with a mean value of about 0.6. However, when it was read into the DINEOF code, errors occurred as below:

STATR DINEOF!
 ********************************************************************
 Numerical data read
 You entered the values:
 number of EOF modes you want to compute          50
 maximal size for the Krylov subspace          56
 
 You asked not to normalise the input matrices
 
 The right and left EOFs will be written in directory ./
 
 ********************************************************************
 
 You entered filenames Chla_2011-2020_1220_100PCT_data.dat
                       Chla_2011-2020_1220_100PCT_mask.dat
 
 initfilename: 
 dineof.init                                                                    
                                                                                
                                           
 
 ********************************************************************
 Now some statistics about your data:
 
  Number of mask land points:                                 2492609
        Dimension of file  1:                  2600  x  2700  x  1220
 
 
                        Mean:                                  0.0000
          Standard deviation:                                  0.0000
 
 ***
 Matrix loaded ... Land points extracted...
 
 Size of the matrix used in DINEOF:                  4527391 x   1220
 ***
 
 Missing data:                         0 out of  1228449724 (  0.00%)
 
 Number of cross validation points                           55234210
 ********************************************************************
 
???
DINEOF start time : Wed May 29 08:23:40 CST 2024
DINEOF  end  time : Wed May 29 08:27:08 CST 2024
DINEOF total time : 3  minutes

Mean and standard deviation of the data is ZERO. So I wondered there may be problems when saving the dat file. But no warnings or other error messages appeared.

ANY help will be appreciated.
Thank you.

@ctroupin
Copy link
Collaborator

Hello,
could you maybe try to write the dataset but for a smaller area and a shorter time period, then make the file available so we can check the content more easily, and see if the problem is also there for a small file?

Another possibility to try is maybe to try to work in netCDF, instead of the binary format created by gwrite.m

@Jacqueline470
Copy link
Author

Hi, we have tried to write the dataset for a shorter time period. It turned out to work totally right in the DINEOF code, and here are the output contents:

STATR DINEOF!
 ********************************************************************
 Numerical data read
 You entered the values:
 number of EOF modes you want to compute          50
 maximal size for the Krylov subspace          56
 
 You asked not to normalise the input matrices
 
 The right and left EOFs will be written in directory ./
 
 ********************************************************************
 
 You entered filenames NDATA_2011.dat
                       NMASK_2011.dat
 
 initfilename: 
 dineof.init                                                                    
                                                                                
                                           
 
 ********************************************************************
 Now some statistics about your data:
 
  Number of mask land points:                                 2730343
        Dimension of file  1:                  2600  x  2700  x   117
 
 
                        Mean:                                  0.4334
          Standard deviation:                                  0.6631
 
 ***
 Matrix loaded ... Land points extracted...
 
 Size of the matrix used in DINEOF:                  4289657 x    117
 ***
 
 Missing data:                 296540728 out of   501889869 ( 59.08%)
 
 Number of cross validation points                            5018938
 ********************************************************************
 
 Time (in minutes) for 1 EOF mode calculation in DINEOF         0.5837
 
 # EOF modes asked:  50        Convergence level required: 0.1E-02
 
 Starting with the EOF mode calculation...
 
 EOF mode    Expected Error    Iterations made   Convergence achieved
 ________    ______________    _______________   ____________________
 
      1              0.3153                 63             0.9969E-03
      2              0.3107                107             0.9975E-03
      3              0.3029                190             0.9989E-03
      4              0.2997                229             0.9975E-03
      5              0.2982                150             0.9991E-03
      6              0.2947                165             0.9960E-03
      7              0.2940                188             0.9992E-03
      8              0.3005                238             0.9980E-03
      9              0.3083                200             0.9996E-03
     10              0.3153                163             0.9986E-03
 
  Minimum reached in cross-validation
  Number of optimal EOF modes:            7
 
  Make last reconstruction, including data put aside for cross-validation
 
      7              0.2940                188             0.9992E-03
 
 DINEOF finished!
 
number of eigenvalues retained for the reconstruction   7
         expected error calculated by cross-validation          0.2940
            total time (in minutes) in lanczos process        275.2208
 
 Now writing data...
 
 
 total variance of the initial matrix   0.439738436920921     
 total variance of the reconstructed matrix   0.855000592278757     
 
 
 ...done!

 
DINEOF start time : Thu May 23 19:22:03 CST 2024
DINEOF  end  time : Fri May 24 00:00:43 CST 2024
DINEOF total time : 278  minutes

However,we have to work with the dataset with a long time period and a larger area in the end to achieve our research goal. So maybe we will try to work with netCDF instead of the binary format.
Thanks for your advice:)

@ctroupin
Copy link
Collaborator

Good to know it works at least for a shorter time period / smaller domain. It is difficult to say if there is an issue when writing a very large file with gwrite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants