Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JEDI-Based Global land DA Cycling Capabilities #1271

Closed
4 tasks
jiaruidong2017 opened this issue Jan 23, 2023 · 35 comments
Closed
4 tasks

JEDI-Based Global land DA Cycling Capabilities #1271

jiaruidong2017 opened this issue Jan 23, 2023 · 35 comments

Comments

@jiaruidong2017
Copy link
Contributor

jiaruidong2017 commented Jan 23, 2023

To support initialization of land surface for the control member of FV3GFS v17, we need to have the capability to cycle with JEDI-based DA for land components. This issue will be used to track issues associated with adding this capability relevant to the global-workflow.

  • Add Land DA Tasks to Rocoto Workflow
  • ADD the following 3 new tasks/jobs:
    jobs/rocoto/landanlinit
    jobs/rocoto/landanlrun
    jobs/rocoto/landanlfinal
  • Add/Modify config.* files for JEDI-based Land DA
  • Changes to config.base to turn on/off land DA
  • Changes to config.resources for all new tasks
  • Addition of new config files for land DA related options
    parm/config/config.landanl
    parm/config/config.landanlinit
    parm/config/config.landanlrun
    parm/config/config.landanlfinal
  • Add Jobs and Scripts for JEDI-Based Land DA
    jobs/JGDAS_GLOBAL_LAND_ANALYSIS_FINALIZE
    jobs/JGDAS_GLOBAL_LAND_ANALYSIS_INITIALIZE
    jobs/JGDAS_GLOBAL_LAND_ANALYSIS_RUN
    scripts/exgdas_global_land_analysis_finalize.py
    scripts/exgdas_global_land_analysis_initialize.py
    scripts/exgdas_global_land_analysis_run.sh
  • Others
    workflow/setup_expt.py
    workflow/applications.py
    workflow/rocoto/workflow_tasks.py
    env/HERA.env
    env/ORION.env
@jiaruidong2017
Copy link
Contributor Author

I try to run the setup scripts to generate experiment as below:

 $HOMEgfs/workflow/setup_expt.py \
     cycled  \
   --app $APP  \
   --pslot $PSLOT  \
   --configdir $HOMEgfs/parm/config \
   --idate $IDATE \
   --edate $EDATE \
   --resdet $RESDET \
   --resens $RESENS \
   --comrot $COMROT \
   --expdir $EXPDIR  \
   --cdump $CDUMP \
   --icsdir $ICSDIR\
   --gfs_cyc $gfs_cyc \
   --nens 80

The above command from wiki page used to be working back to 1 or 2 weeks ago. But it failed today with the following errors:

+ /scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/global-workflow/workflow/setup_expt.py cycled --app ATM --pslot v16noahmp --configdir /scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/global-workflow/parm/config --idate 2021032318 --edate 2021032500 --resdet 48 --resens 48 --comrot /scratch1/NCEPDEV/global/Jiarui.Dong/ptmp --expdir /scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/para_gfs --cdump gdas --icsdir /scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/para_gfs/misc --gfs_cyc 1 --nens 80

directory already exists in /scratch1/NCEPDEV/global/Jiarui.Dong/ptmp/v16noahmp

Do you wish to over-write [y/N]: y

directory already exists in /scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/para_gfs/v16noahmp

Do you wish to over-write [y/N]: y
Traceback (most recent call last):
  File "/scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/global-workflow/workflow/setup_expt.py", line 353, in <module>
    fill_COMROT(host, user_inputs)
  File "/scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/global-workflow/workflow/setup_expt.py", line 44, in fill_COMROT
    fill_modes[inputs.mode](host, inputs)
  File "/scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/global-workflow/workflow/setup_expt.py", line 73, in fill_COMROT_cycled
    files = os.listdir(src_dir)
FileNotFoundError: [Errno 2] No such file or directory: '/scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/para_gfs/misc/enkfgdas.20210323/18/mem001/atmos/INPUT'

@KateFriedman-NOAA @WalterKolczynski-NOAA @aerorahul @CoryMartin-NOAA Do you have any idea on how to fix the problems? Thanks.

@aerorahul
Copy link
Contributor

As the error says, FileNotFoundError

 ❯❯❯ ls /scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/para_gfs/misc/enkfgdas.20210323/18/mem001/atmos/INPUT                          
/bin/ls: cannot access '/scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/para_gfs/misc/enkfgdas.20210323/18/mem001/atmos/INPUT': No such file or directory

@jiaruidong2017
Copy link
Contributor Author

Thanks @aerorahul for your quick response. The above FileNotFoundError path is the path for the initial restart data. Previously, we don't need to prepare the initial restart data when we generate the experiment. Does this mean we have to prepare the initial restart data ready before setting up the experiment?

@aerorahul
Copy link
Contributor

No.
You don't have to prepare the initial data before setting up the experiment.
The --icsdir argument is optional to setup_expt.py. It is provided for users who have initial conditions prepared and staged in the directories that the GFS application expects (for cycled mode).

Retry your command (setup_expt.py) without --icsdir $ICSDIR and let us know if you encounter an error.

@jiaruidong2017
Copy link
Contributor Author

@aerorahul It works without --icsdir $ICSDIR. Thanks.

@jiaruidong2017
Copy link
Contributor Author

@CoryMartin-NOAA The run still failed at gdasanal as below:
image

The log file is: /scratch1/NCEPDEV/global/Jiarui.Dong/ptmp/aodid3/logs/2019061406/gdasanal.log. I copied the error messages from the above logfile and pasted below: Do you have any quick suggestions and comments? Thanks.

+ exglobal_atmos_analysis.sh[666]: ncmd=29
+ exglobal_atmos_analysis.sh[667]: '[' 29 -gt 0 ']'
+ exglobal_atmos_analysis.sh[668]: ncmd_max=29
++ exglobal_atmos_analysis.sh[669]: eval echo srun -l --export=ALL -n '$ncmd' --multi-prog
+++ exglobal_atmos_analysis.sh[669]: echo srun -l --export=ALL -n 29 --multi-prog
+ exglobal_atmos_analysis.sh[669]: APRUNCFP_UNZIP='srun -l --export=ALL -n 29 --multi-prog'
+ exglobal_atmos_analysis.sh[670]: srun -l --export=ALL -n 29 --multi-prog /scratch1/NCEPDEV/stmp2/Jiarui.Dong/RUNDIRS/aodid3/anal.183671/mp_unzip.sh
srun: Warning: can't honor --ntasks-per-node set to 8 which doesn't match the requested tasks 29 with the number of requested nodes 11. Ignoring --ntasks-per-node.
 2: /bin/mv: cannot stat 'diag_sndrd3_g15_ges.2019061400.nc4': No such file or directory
 0: /bin/mv: cannot stat 'diag_sndrd1_g15_ges.2019061400.nc4': No such file or directory
 1: /bin/mv: cannot stat 'diag_sndrd2_g15_ges.2019061400.nc4': No such file or directory
 3: /bin/mv: cannot stat 'diag_sndrd4_g15_ges.2019061400.nc4': No such file or directory
25: /bin/mv: cannot stat 'diag_mhs_metop-b_ges.2019061400.nc4': No such file or directory
27: /bin/mv: cannot stat 'diag_avhrr_n18_ges.2019061400.nc4': No such file or directory
28: /bin/mv: cannot stat 'diag_avhrr_metop-a_ges.2019061400.nc4': No such file or directory
18: /bin/mv: cannot stat 'diag_seviri_m11_ges.2019061400.nc4': No such file or directory
10: /bin/mv: cannot stat 'diag_mhs_metop-a_ges.2019061400.nc4': No such file or directory
srun: error: h11c19: tasks 0-2: Exited with exit code 1
srun: launch/slurm: _step_signal: Terminating StepId=41606321.0
srun: error: h24c40: tasks 27-28: Exited with exit code 1
srun: error: h11c21: tasks 6-8: Terminated
16: /bin/mv: cannot stat 'diag_mhs_n19_ges.2019061400.nc4': No such file or directory
24: /bin/mv: cannot stat 'diag_amsua_metop-b_ges.2019061400.nc4': No such file or directory
15: /bin/mv: cannot stat 'diag_amsua_n19_ges.2019061400.nc4': No such file or directory
17: /bin/mv: cannot stat 'diag_seviri_m08_ges.2019061400.nc4': No such file or directory
23: /bin/mv: cannot stat 'diag_hirs4_metop-b_ges.2019061400.nc4': No such file or directory
srun: error: h22c49: tasks 23-24: Exited with exit code 1
srun: error: h22c18: tasks 15-17: Exited with exit code 1
srun: error: h23c03: task 25: Exited with exit code 1
srun: error: h23c03: task 26: Terminated
srun: error: h11c20: task 3: Exited with exit code 1
srun: error: h11c20: tasks 4-5: Terminated
srun: error: h22c29: tasks 21-22: Terminated
srun: error: h22c27: task 18: Exited with exit code 1
srun: error: h22c27: tasks 19-20: Terminated
srun: error: h11c22: tasks 9,11: Terminated
srun: error: h11c22: task 10: Exited with exit code 1
srun: error: h22c11: tasks 12-14: Terminated
srun: Force Terminated StepId=41606321.0
+ exglobal_atmos_analysis.sh[1]: postamble exglobal_atmos_analysis.sh 1675112913 143
+ preamble.sh[68]: set +x
End exglobal_atmos_analysis.sh at 21:08:49 with error code 143 (time elapsed: 00:00:16)
+ JGLOBAL_ATMOS_ANALYSIS[1]: postamble JGLOBAL_ATMOS_ANALYSIS 1675112906 143
+ preamble.sh[68]: set +x
End JGLOBAL_ATMOS_ANALYSIS at 21:08:49 with error code 143 (time elapsed: 00:00:23)
+ anal.sh[1]: postamble anal.sh 1675112904 143
+ preamble.sh[68]: set +x
End anal.sh at 21:08:49 with error code 143 (time elapsed: 00:00:25)

@CoryMartin-NOAA
Copy link
Contributor

Yes this is because of #1005 that has not yet been fixed. The easiest thing to get this going is to go into your $EXPDIR/config.anal file and comment out lines 15 and 17

@jiaruidong2017
Copy link
Contributor Author

Thanks @CoryMartin-NOAA I will test it.

@jiaruidong2017
Copy link
Contributor Author

@CoryMartin-NOAA After making the changes, the run failed again with new error messages as below. The executable gsi.x is not found. When I built the global workflow, I chose to build GDASApp and skip building gsi and enkf. Therefore, the executable gsi.x is not available. Do I need to rebuild the global-workflow package with building gsi and enkf?

+ exglobal_atmos_analysis.sh[920]: export OMP_NUM_THREADS=5
+ exglobal_atmos_analysis.sh[920]: OMP_NUM_THREADS=5
+ exglobal_atmos_analysis.sh[921]: export pgm=/scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/global-workflow.ref2/exec/gsi.x
+ exglobal_atmos_analysis.sh[921]: pgm=/scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/global-workflow.ref2/exec/gsi.x
+ exglobal_atmos_analysis.sh[922]: . prep_step
++ prep_step[3]: '[' -n /scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/global-workflow.ref2/exec/gsi.x ']'
++ prep_step[3]: '[' -n OUTPUT.110380 ']'
++ prep_step[4]: echo /scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/global-workflow.ref2/exec/gsi.x
++ prep_step[7]: '[' -f errfile ']'
++ prep_step[11]: export FORT01=0
++ prep_step[11]: FORT01=0
+++ prep_step[12]: env
+++ prep_step[12]: grep '^FORT[0-9]\{1,\}='
+++ prep_step[12]: awk -F= '{print $1}'
++ prep_step[12]: unset FORT01
+ exglobal_atmos_analysis.sh[924]: /bin/cp -p /scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/global-workflow.ref2/exec/gsi.x /scratch1/NCEPDEV/stmp2/Jiarui.Dong/RUNDIRS/aodid4/anal.110014
/bin/cp: cannot stat '/scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/global-workflow.ref2/exec/gsi.x': No such file or directory
+ exglobal_atmos_analysis.sh[1]: postamble exglobal_atmos_analysis.sh 1675176628 1
+ preamble.sh[68]: set +x
End exglobal_atmos_analysis.sh at 14:50:34 with error code 1 (time elapsed: 00:00:06)
+ JGLOBAL_ATMOS_ANALYSIS[1]: postamble JGLOBAL_ATMOS_ANALYSIS 1675176615 1
+ preamble.sh[68]: set +x
End JGLOBAL_ATMOS_ANALYSIS at 14:50:34 with error code 1 (time elapsed: 00:00:19)
+ anal.sh[1]: postamble anal.sh 1675176612 1
+ preamble.sh[68]: set +x
End anal.sh at 14:50:34 with error code 1 (time elapsed: 00:00:22)

@CoryMartin-NOAA
Copy link
Contributor

Yes, -u -g when running checkout to include both GSI and UFS DA (GDASApp)

@jiaruidong2017
Copy link
Contributor Author

@CoryMartin-NOAA Okay thanks.

@jiaruidong2017
Copy link
Contributor Author

image
@CoryMartin-NOAA I reboot the gdasaeroanlinit run several times, and the gdasaeroanlinit run still failed with the error messages below. Any suggestions? Thanks.

Traceback (most recent call last):
  File "/scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/global-workflow.ref2/scripts/exglobal_aero_analysis_initialize.py", line 25, in <module>
    AeroAnl.initialize()
  File "/scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/global-workflow.ref2/ush/python/pygw/src/pygw/logger.py", line 261, in wrapper
    retval = func(*args, **kwargs)
  File "/scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/global-workflow.ref2/ush/python/pygfs/task/aero_analysis.py", line 83, in initialize
    FileHandler(self._get_bkg_dict(AttrDict(self.task_config, **self.task_config))).sync()
AttributeError: 'AerosolAnalysis' object has no attribute '_get_bkg_dict'
+ JGLOBAL_AERO_ANALYSIS_INITIALIZE[1]: postamble JGLOBAL_AERO_ANALYSIS_INITIALIZE 1675184831 1
+ preamble.sh[68]: set +x
End JGLOBAL_AERO_ANALYSIS_INITIALIZE at 17:07:18 with error code 1 (time elapsed: 00:00:07)
+ aeroanlinit.sh[1]: postamble aeroanlinit.sh 1675184824 1
+ preamble.sh[68]: set +x
End aeroanlinit.sh at 17:07:18 with error code 1 (time elapsed: 00:00:14)

@CoryMartin-NOAA
Copy link
Contributor

@jiaruidong2017 recent commits to the branch have caused the cycling to fail, which is why my PR is now a draft again. You'll have to hold off (or go to a prior commit of the workflow) in order to get it to cycle to completion.

@jiaruidong2017
Copy link
Contributor Author

@CoryMartin-NOAA Thanks. Yes, I updated the both the develop and your branch in this test. Thanks for your information.

@jiaruidong2017
Copy link
Contributor Author

@CoryMartin-NOAA I updated and rebuilt your branch feature/aeroda_staticb_crm. After I submitted the run, the run failed as below:

image

I checked the log file gdasprep.log, and the error message is below:

+ getdump.sh[33](2019061406): for file in '$(ls ${prefix}*)'
+ getdump.sh[34](2019061406): ln -fs /scratch1/NCEPDEV/da/Cory.R.Martin/GEFS-Aero/glopara_dump/gdas.20190614/06/atmos/gdas.t06z.1bamua.tm00.bufr_d /scratch1/NCEPDEV/da/Cory.R.Martin/GEFS-Aero/glopara_dump/gdas.20190614/06/atmos/gdas.t06z.1bamua.tm00.bufr_d
ln: '/scratch1/NCEPDEV/da/Cory.R.Martin/GEFS-Aero/glopara_dump/gdas.20190614/06/atmos/gdas.t06z.1bamua.tm00.bufr_d' and '/scratch1/NCEPDEV/da/Cory.R.Martin/GEFS-Aero/glopara_dump/gdas.20190614/06/atmos/gdas.t06z.1bamua.tm00.bufr_d' are the same file
+ getdump.sh[1](2019061406): postamble getdump.sh 1675903186 1
+ preamble.sh[68](2019061406): set +x
End getdump.sh at 00:39:47 with error code 1 (time elapsed: 00:00:01)
+ prep.sh[1]: postamble prep.sh 1675903182 1
+ preamble.sh[68]: set +x
End prep.sh at 00:39:47 with error code 1 (time elapsed: 00:00:05)

Any suggestions? Thanks.

@CoryMartin-NOAA
Copy link
Contributor

@jiaruidong2017 that is weird, it looks like you have your run configured incorrectly. I suggest checking what your ROTDIR is

@jiaruidong2017
Copy link
Contributor Author

@CoryMartin-NOAA My ROTDIR in aodid4.xml is as below:

<!ENTITY ROTDIR "/scratch1/NCEPDEV/global/Jiarui.Dong/ptmp/aodid4">

@CoryMartin-NOAA
Copy link
Contributor

what is it in config.base? that error suggests it is trying to link files from the DMPDIR to the DMPDIR

@jiaruidong2017
Copy link
Contributor Author

jiaruidong2017 commented Feb 13, 2023

@CoryMartin-NOAA I built the global-workflow today in develop branch. I conducted the global-workflow run, but the run failed at gdasaeroanlinit as below:

image
image

The logfile shows the error messages as below:

[38;5;39m2023-02-13 16:41:58,560 - DEBUG    - analysis    :  returning: {'mkdir': ['/scratch1/NCEPDEV/stmp2/Jiarui.Dong/RUNDIRS/aodid6/gdasaeroanl_06/obs'], 'copy': [['/scratch1/NCEPDEV/global/Jiarui.Dong/ptmp/aodid6/gdas.20190614/06/obs/gdas.t06z.viirs_npp.2019061406.nc4', '/scratch1/NCEPDEV/stmp2/Jiarui.Dong/RUNDIRS/aodid6/gdasaeroanl_06/obs/gdas.t06z.viirs_npp.2019061406.nc4']]}^[[0m
Created /scratch1/NCEPDEV/stmp2/Jiarui.Dong/RUNDIRS/aodid6/gdasaeroanl_06/obs
Traceback (most recent call last):
  File "/scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/global-workflow.ref6/ush/python/pygw/src/pygw/fsutils.py", line 69, in cp
    shutil.copyfile(source, target)
  File "/scratch1/NCEPDEV/da/python/opt/core/miniconda3/4.6.14/envs/gdasapp/lib/python3.7/shutil.py", line 120, in copyfile
    with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: '/scratch1/NCEPDEV/global/Jiarui.Dong/ptmp/aodid6/gdas.20190614/06/obs/gdas.t06z.viirs_npp.2019061406.nc4'

During my previous runs without recent updates, the obs directory in the comrot path (e.g., /scratch1/NCEPDEV/global/Jiarui.Dong/ptmp/aodid6/gdas.20190614/06/obs) was created and the obs files were linked in the obs directory (see below for different runs).

/scratch1/NCEPDEV/global/Jiarui.Dong/ptmp/aodid3/gdas.20190614/00/:
atmos  chem  obs

/scratch1/NCEPDEV/global/Jiarui.Dong/ptmp/aodid4/gdas.20190614/00/:
atmos  chem  obs

/scratch1/NCEPDEV/global/Jiarui.Dong/ptmp/aodid5/gdas.20190614/00/:
atmos  chem  obs

/scratch1/NCEPDEV/global/Jiarui.Dong/ptmp/aodid6/gdas.20190614/00/:
atmos  chem

The aodid3 used the develop as beow:

commit d8c1bd5dfb6b2654b5b8c5121af68f7473fac26e (HEAD -> develop, origin/develop, origin/HEAD)
Author: Kate Friedman <[email protected]>
Date:   Fri Feb 10 12:26:30 2023 -0500

    Update RTD GFS operational version to v16.3.6 (#1305)

    Update the status of operations to the newly implemented v16.3.6 version on the read-the-docs main page.

    Refs #1278

The aodid4 used your branch feature/aeroda_staticb_crm as below:

commit 68d5ca547155e5138bdfc83d8744d595f615db8a (HEAD -> feature/aeroda_staticb_crm, origin/feature/aeroda_staticb_crm)
Author: Cory Martin <[email protected]>
Date:   Tue Feb 7 18:51:34 2023 +0000

    Fix PYTHONPATH

The aodid5 used your code as below:

commit 6c5ae54d8987deb1d9f430fbc0a615c4c93a395e (HEAD -> feature/aeroda_staticb_crm, origin/feature/aeroda_staticb_crm)
Author: CoryMartin-NOAA <[email protected]>
Date:   Mon Feb 6 17:50:53 2023 +0000

    Working now need to clean up

The aodid6 used the develop branch from the current global-workflow code as below:

commit 1040216d8a4efb9955efecebf59775e91d8845e2 (HEAD -> develop, origin/develop, origin/HEAD)
Author: Cory Martin <[email protected]>
Date:   Fri Feb 10 17:16:44 2023 -0500

    Add in initial 3DVar aerosol DA cycling capability (#1106)

@CoryMartin-NOAA Do I miss anything? Any suggestions? Thank you very much.

@jiaruidong2017
Copy link
Contributor Author

@CoryMartin-NOAA I conducted the above global-workflow run again, and found it failed at the different step gdasaeroanlrun this time.

image
image

For this run, the obs directories were created as below:

> ls /scratch1/NCEPDEV/global/Jiarui.Dong/ptmp/aodid6/gdas.20190614/00
atmos  chem  obs
> ls /scratch1/NCEPDEV/global/Jiarui.Dong/ptmp/aodid6/gdas.20190614/06/
atmos  chem  obs

I checked the log file, and the error messages show below:

76: FATAL from PE    76: fms_io(restore_state_all): unable to find any restart files specified by
/scratch1/NCEPDEV/stmp2/Jiarui.Dong/RUNDIRS/aodid6/gdasaeroanl_06/bkg/20190614.060000.fv_core.res.tile2.nc

The logfile on hera is /scratch1/NCEPDEV/global/Jiarui.Dong/ptmp/aodid6/logs/2019061406/gdasaeroanlrun.log

@jiaruidong2017
Copy link
Contributor Author

jiaruidong2017 commented Feb 14, 2023

The logfile: /scratch1/NCEPDEV/global/Jiarui.Dong/ptmp/aodid6/logs/2019061406/gdasaeroanlinit.log shows below:

Created /scratch1/NCEPDEV/stmp2/Jiarui.Dong/RUNDIRS/aodid6/gdasaeroanl_06/bkg
Copied /scratch1/NCEPDEV/global/Jiarui.Dong/ptmp/aodid6/gdas.20190614/00/atmos/RESTART/20190614.060000.coupler.res to /scratch1/NCEPDEV/stmp2/Jiarui.Dong/RUNDIRS/aodid6/gdasaeroanl_06/bkg/20190614.060000.coupler.res

I didn't find the statements showing the copy of the restart files to the bkg directory. The only copied file is 20190614.060000.coupler.res to the bkg directory.

gdasaeroanlinit.log
gdasaeroanlrun.log

@CoryMartin-NOAA
Copy link
Contributor

@jiaruidong2017 something weird is going on... for some reason it looks like it is not adding the right files to the list to copy. Can you add the following to your ush/python/pygfs/task/aero_analysis.py file?

Line 227:

        for itile in range(1, task_config.ntiles + 1):

right before this, can you add

print(task_config.ntiles)

and then rerun the gdasaeroanlinit job? I am curious to see what this prints as... tagging @aerorahul for awareness

@jiaruidong2017
Copy link
Contributor Author

@CoryMartin-NOAA It seems an empty {}.

@CoryMartin-NOAA
Copy link
Contributor

ok I will need to investigate further @aerorahul is it possible there is an issue with the AttrDict implementation?

@aerorahul
Copy link
Contributor

I think a bug has been found.

Please replace

super().__init__(config, ntiles=6)

with

super().__init__(config)
self.config.ntiles = 6

in https://github.com/NOAA-EMC/global-workflow/blob/develop/ush/python/pygfs/task/analysis.py#L25

@jiaruidong2017
Copy link
Contributor Author

@CoryMartin-NOAA I rerun the gdasaeroanlinit job with the suggested changes by @aerorahul. The task_config.ntiles is 6 now.

@CoryMartin-NOAA
Copy link
Contributor

@jiaruidong2017 thanks for the test and @aerorahul thanks for the bugfix! I think @andytangborn has found another bug that we should combine into one PR.

@jiaruidong2017
Copy link
Contributor Author

@CoryMartin-NOAA I rebuilt the global-workflow in develop branch, and conducted the rerun. The run failed at gdasaeroanlinit again as below:
image
image
The error message below from the logfile shows it failed at the beginning of the script:

Begin aeroanlinit.sh at Thu Feb 16 01:55:14 UTC 2023
+ aeroanlinit.sh[7]: . /scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/global-workflow.ref7/ush/load_ufsda_modules.sh
/scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/global-workflow.ref7/ush/load_ufsda_modules.sh: line 4: DEBUG_WORKFLOW: unbound variable
+++ load_ufsda_modules.sh[1]: postamble aeroanlinit.sh 1676512514 1
+++ preamble.sh[68]: set +x
End aeroanlinit.sh at 01:55:14 with error code 1 (time elapsed: 00:00:00)

@jiaruidong2017
Copy link
Contributor Author

@CoryMartin-NOAA The run failed at gdasaeroanlinit again as below:
image
image
The error messages from the logfile (gdasaeroanlinit.log) show below:

[38;5;39m2023-02-16 15:56:11,445 - DEBUG    - analysis    :  returning: {'mkdir': ['/scratch1/NCEPDEV/stmp2/Jiarui.Dong/RUNDIRS/aodid7/gdasaeroanl_06/obs'], 'copy': [['/scratch1/NCEPDEV/global/Jiarui.Dong/ptmp/aodid7/gdas.20190614/06/obs/gdas.t06z.viirs_npp.2019061406.nc4', '/scratch1/NCEPDEV/stmp2/Jiarui.Dong/RUNDIRS/aodid7/gdasaeroanl_06/obs/gdas.t06z.viirs_npp.2019061406.nc4']]}^[[0m
Created /scratch1/NCEPDEV/stmp2/Jiarui.Dong/RUNDIRS/aodid7/gdasaeroanl_06/obs
Traceback (most recent call last):
  File "/scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/global-workflow.ref7/ush/python/pygw/src/pygw/fsutils.py", line 69, in cp
    shutil.copyfile(source, target)
  File "/scratch1/NCEPDEV/da/python/opt/core/miniconda3/4.6.14/envs/gdasapp/lib/python3.7/shutil.py", line 120, in copyfile
    with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: '/scratch1/NCEPDEV/global/Jiarui.Dong/ptmp/aodid7/gdas.20190614/06/obs/gdas.t06z.viirs_npp.2019061406.nc4'

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/global-workflow.ref7/scripts/exglobal_aero_analysis_initialize.py", line 25, in <module>
    AeroAnl.initialize()
  File "/scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/global-workflow.ref7/ush/python/pygw/src/pygw/logger.py", line 261, in wrapper
    retval = func(*args, **kwargs)
  File "/scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/global-workflow.ref7/ush/python/pygfs/task/aero_analysis.py", line 71, in initialize
    super().initialize()
  File "/scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/global-workflow.ref7/ush/python/pygfs/task/analysis.py", line 32, in initialize
    FileHandler(obs_dict).sync()
  File "/scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/global-workflow.ref7/ush/python/pygw/src/pygw/file_utils.py", line 39, in sync
    sync_factory[action](files)
  File "/scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/global-workflow.ref7/ush/python/pygw/src/pygw/file_utils.py", line 59, in _copy_files
    cp(src, dest)
  File "/scratch1/NCEPDEV/global/Jiarui.Dong/JEDI/GlobalWorkflow/global-workflow.ref7/ush/python/pygw/src/pygw/fsutils.py", line 71, in cp
    raise OSError(f"unable to copy {source} to {target}")
OSError: unable to copy /scratch1/NCEPDEV/global/Jiarui.Dong/ptmp/aodid7/gdas.20190614/06/obs/gdas.t06z.viirs_npp.2019061406.nc4 to /scratch1/NCEPDEV/stmp2/Jiarui.Dong/RUNDIRS/aodid7/gdasaeroanl_06/obs/gdas.t06z.viirs_npp.2019061406.nc4
+ JGLOBAL_AERO_ANALYSIS_INITIALIZE[1]: postamble JGLOBAL_AERO_ANALYSIS_INITIALIZE 1676580956 1
+ preamble.sh[68]: set +x
End JGLOBAL_AERO_ANALYSIS_INITIALIZE at 20:56:11 with error code 1 (time elapsed: 00:00:15)
+ aeroanlinit.sh[1]: postamble aeroanlinit.sh 1676580951 1
+ preamble.sh[68]: set +x
End aeroanlinit.sh at 20:56:11 with error code 1 (time elapsed: 00:00:20)

It seems the error results from the copy of the observation data file. As I mentioned early today, after I reboot the gdasaeroanlinit, the run can continue but will fail at gdasaeroanlfinal later. @CoryMartin-NOAA Do you have any suggestions on how to deal with this issue? Thanks.

@CoryMartin-NOAA
Copy link
Contributor

@jiaruidong2017 this looks like something wrong in your configuration, probably DMPDIR, it should not be copying obs from a ptmp path in your global space.

@jiaruidong2017
Copy link
Contributor Author

jiaruidong2017 commented Feb 17, 2023

@CoryMartin-NOAA In my config.base, I did find the difference in COMIN_OBS and COMIN_GES_OBS when comparing to your config.base as below:

My config.base is:

export COMIN_OBS=${COMIN_OBS:-${ROTDIR}/${CDUMP}.${PDY}/${cyc}/obs}
export COMIN_GES_OBS=${COMIN_GES_OBS:-${ROTDIR}/${CDUMP}.${PDY}/${cyc}/obs}

Your config.base is:

export COMIN_OBS=${DMPDIR}/${CDUMP}.${PDY}/$cyc/atmos
export COMIN_GES_OBS=${DMPDIR}/${CDUMP}.${PDY}/$cyc/atmos

My config.base used the ROTDIR for the COMIN_OBS and COMIN_GES_OBS. The use of ROTDIR is the recent updates. My question is, when I rebooted the gdasaeroanlinit, the run succeeded. I then checked my ROTDIR and found the observation data are there as below:

> ls /scratch1/NCEPDEV/global/Jiarui.Dong/ptmp/aodid7/gdas.20190614/00/
atmos  chem  obs
>ls /scratch1/NCEPDEV/global/Jiarui.Dong/ptmp/aodid7/gdas.20190614/06/
atmos  chem  obs

It seems that gdasaeroanlinit run is executed before copying the obs directory to my ROTDIR. Do you have any suggestions? Thanks.

@jiaruidong2017
Copy link
Contributor Author

@CoryMartin-NOAA If I use your config.base for COMIN_OBS and COMIN_GES_OBS, the run failed at gdasprep as below:
image
image
Although the gdasaeroanlinit run succeeded this time, the gdasprep run failed. The error message at gdasprep.log is below:

+ getdump.sh[34](2019061406): ln -fs /scratch1/NCEPDEV/da/Cory.R.Martin/GEFS-Aero/glopara_dump/gdas.20190614/06/atmos/gdas.t06z.1bamua.tm00.bufr_d /scratch1/NCEPDEV/da/Cory.R.Martin/GEFS-Aero/glopara_dump/gdas.20190614/06/atmos/gdas.t06z.1bamua.tm00.bufr_d
ln: '/scratch1/NCEPDEV/da/Cory.R.Martin/GEFS-Aero/glopara_dump/gdas.20190614/06/atmos/gdas.t06z.1bamua.tm00.bufr_d' and '/scratch1/NCEPDEV/da/Cory.R.Martin/GEFS-Aero/glopara_dump/gdas.20190614/06/atmos/gdas.t06z.1bamua.tm00.bufr_d' are the same file
+ getdump.sh[1](2019061406): postamble getdump.sh 1676700511 1
+ preamble.sh[68](2019061406): set +x
End getdump.sh at 06:08:32 with error code 1 (time elapsed: 00:00:01)
+ prep.sh[1]: postamble prep.sh 1676700508 1
+ preamble.sh[68]: set +x
End prep.sh at 06:08:32 with error code 1 (time elapsed: 00:00:04)

The error results from trying to link the obs data from DMPDIR to DMPDIR. I think the obs data should be linked from DMPDIR to ROTDIR at gdasprep step. Therefore, the COMIN_OBS and COMIN_GES_OBS should use ROTDIR in the config.base setup. Any suggestions? Thanks.

@CoryMartin-NOAA
Copy link
Contributor

@jiaruidong2017 see this #1309 it just changed 3 days ago... Still need to figure out how to fix this for our new work.

@WalterKolczynski-NOAA
Copy link
Contributor

Should this issue be an Epic with smaller issues for discrete parts? @aerorahul @CoryMartin-NOAA @jiaruidong2017

@jiaruidong2017
Copy link
Contributor Author

@WalterKolczynski-NOAA Yes and following the suggestion by @CoryMartin-NOAA , I am working on the smaller PRs for this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants