Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow nccopy in 4.6.3+ compared to 4.4.1.1 #1398

Closed
WardF opened this issue May 21, 2019 · 18 comments · Fixed by #1409
Closed

Slow nccopy in 4.6.3+ compared to 4.4.1.1 #1398

WardF opened this issue May 21, 2019 · 18 comments · Fixed by #1409
Assignees
Milestone

Comments

@WardF
Copy link
Member

WardF commented May 21, 2019

As reported by Martin Schmidt, using a 1.7GB netCDF classic file:

nccopy -k3 -d2 [infile] [outfile] takes approximately 30 seconds using netCDF-C 4.4.1.1, whereas it takes nearly 100 minutes in 4.6.3, 4.7.0 and the current master. Debugging is under way.

@WardF WardF added this to the 4.7.1 milestone May 21, 2019
@WardF WardF self-assigned this May 21, 2019
@DennisHeimbigner
Copy link
Collaborator

If the -d2 option is removed, then the speed seems ok. So I think the problem is
somehow related to zip compression usage.

@edhartnett
Copy link
Contributor

A possibly related fact - my recent performance testing of PIO shows that it's an order of magnitude slower when using compression. This does not agree with what I expect from compression, which is a pretty small slowdown.

@DennisHeimbigner
Copy link
Collaborator

I assume that deflation was specified to force the output to be in netcdf-4 format,
correct?

@DennisHeimbigner
Copy link
Collaborator

Ed, is your slowdown related to the use of nccopy, or is it more general?

@edhartnett
Copy link
Contributor

PIO uses deflate level of 1 by default for netCDF-4 access.

PIO does not use nccopy, it uses the C library directly for data access. So I would look for something that happened to the compression in the C library. I suspect nccopy is just seeing what is happening in the C layer, not adding any slowdown in this case.

@DennisHeimbigner
Copy link
Collaborator

I had the def_var_chunking code print out the chunksizes being used.
For: float WIND_SPEED(TIME, DEPTH, LATITUDE81_400, LONGITUDE481_840) ;
it appears to use: chunksizes var=WIND_SPEED sizes=356,1,320,360
which either is the default or is actually being specified.
In any case, this appears to be using a single chunk, which might explain the speed
issue.
[Ward can you do ncdump -hs on the file copy to verify the chunk sizes being used?]
I am going to try an experiment in specifying the chunksizes explicitly to see if that makes
a difference.

@WardF
Copy link
Member Author

WardF commented May 21, 2019

The fact that the 4.4.1.1 version has no such slowdown would suggest that something changed, when using compression, between 4.4.1.1 and the current versions. I'll be doing some profiling as soon as I'm done here at the RMHPC symposium.

@WardF
Copy link
Member Author

WardF commented May 21, 2019

@DennisHeimbigner I'm happy to test that, but regardless of chunk sizes, changing the version of netcdf (and nothing else) sees a time increase from seconds to over an hour. The geometry of the file will no doubt play a part overall, but holding it constant between the netCDF versions makes me think that it is not the primary issue.

@WardF
Copy link
Member Author

WardF commented May 21, 2019

@DennisHeimbigner the results of the ncdump -hs are attached below.

@WardF
Copy link
Member Author

WardF commented May 21, 2019

So, I am not entirely certain that the 4.4.1.1 version is actually compressing the data. Let me continue to investigate. Withdrawn.

@DennisHeimbigner
Copy link
Collaborator

At the moment, I am pretty sure the problem is with the default chunking sizes.
If I force the use of same default chunksizes, then the time to copy
is very close to the same as seen by 4.4.1.1.

@DennisHeimbigner
Copy link
Collaborator

More specifically, this command simulates the default chunking of 4.4.1.1

./nccopy -k3 -d2 -c LONGITUDE481_840/360,LATITUDE81_400/320,DEPTH/1,TIME/1 wind_2008.nc ./junkd2.nc

@WardF
Copy link
Member Author

WardF commented May 21, 2019

I stand corrected, we did change the default chunk sizes at some point. So, that's a great lesson learned. I'll follow up with Martin. So, nccopy doesn't inherit the chunk sizes when copying? I can't recall if I knew this, or if I'm surprised by it.

@WardF
Copy link
Member Author

WardF commented May 21, 2019

Thanks for your help with this Dennis; closing out this issue now as I'm confident you have solved it.

-Ward

@WardF WardF closed this as completed May 21, 2019
@edhartnett
Copy link
Contributor

OK, but is it really solved? Because apparently whatever changes were made to the default chunksizes resulted in a significant slowdown.

Or did I misunderstand what happened with this ticket?

@DennisHeimbigner
Copy link
Collaborator

I did not mean to imply we solved it. The error is that the default chunking
is being set/computed incorrectly. The question is: what changed and where?
My speculation is that there is some kind of error where specifying deflation
requires chunking on the output and that is being set incorrectly somehow.

@DennisHeimbigner
Copy link
Collaborator

Ok, I see that there is an error in nccopy to fix.
But AFIAK this is nccopy specific, so I do not know what the PIO issue is.
Ed - can you see if the chunking on those slow files looks reasonable?

@edhartnett
Copy link
Contributor

Yes, examining the chunking is on the (long) list of things to investigate from the PIO performance effort. I will have many more details and some graphs at the June 3 NOAA meeting, which we are now calling a HPC I/O Workshop.

DennisHeimbigner added a commit that referenced this issue May 21, 2019
re: issue #1398
re: esupport NDY-294972

The new chunking code added to nccopy missed one case.
In the event that there are no chunking specifications
of any kind, and the input is not netcdf-4, and the output
is netcdf-4 and must be chunked, then use the default chunking
that the library computes as part of the nc_def_var() function.

Misc. changes:
1. add some chunking debug code to hdf5var.c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants