-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data layout / BP-X #1661
Comments
@germasch do you have a timeline of when XGC would be ready for this? Also, I am very wary of adding API functionality when there is a solution in place. APIs can get very bloated pretty easily in any library and over time it becomes very hard to not only maintain, but to attract new projects. Back to the issue, exposing My point is: adios2 already provides a solution following standardized convention for each language, but gives you info if the dimensions were reversed at read so you know the original data layout in terms of fast/slow indices. I am curious to see real code comparing the current solution and another one adding APIs. I personally feel more confused when handling new functions I need to learn than just going with something standard. Having code will also allow asking the practitioners' opinion based on their experience and which approach is easier to understand. I will write in another box to compare side-by-side. |
Variable is written as
Also, with the proposed functions we still must go down to the block level
Read: (one block is row-major and the other column-major)
|
I can't really predict what's going to happen with XGC, as I said there's currently work underway of rewriting part of the code into C++, but I don't think I/O is part of that current effort. I do have code to show, and that's from my own code (PSC, a particle-in-cell code): inline kg::io::Dims makeDims(int m, const Int3& dims)
{
return kg::io::Dims{size_t(m), size_t(dims[2]), size_t(dims[1]),
size_t(dims[0])};
} There is some wrapping layer involved, but you get the idea: While the data is indexed as adios2 doesn't offer a solution to this issue currently, that's why I reverse the dims myself. Obviously, since I'm lying to adios2, it has no idea what's really going on, so there is no way that it could tell a visualization tool that the dims really should be the other way around. Reading the data back in the code itself is basically not an issue, because I'm just reversing the dims there as well. But when, e.g., accessing it via python, the indices are reversed, and that's unpleasant. Most applications deal with the row-major / column-major the same way, that is, if your data doesn't match the layout you're using, you just reverse the dims, because the alternative of transposing the data is expensive. That's what you got to do when using HDF5 from Fortran, or, apparently, when using vtk with row-major arrays. I'm not arguing with that, that's the reasonable thing to do, but when one has to reverse dims for this reason, it'd be nice to have a way to tell that's happened. I take it from what @berkgevici wrote in the other thread, that while vtk may always use col-major arrays, it stores the info that the original layout is row-major so that the data can be displayed correctly in terms of x,y,z, order. That's the main thing I'm asking for, and it's not supported for my use case (above) in the current API. How one wants to handle the mismatching data layout case (if at all) is up to the application. In fortran, you don't really have a choice. In C++, you can choose to always do row-major -- but you can also choose to use an multi-dimensional array class that can support both layouts natively. The same thing goes for Python -- ndarrays do support both data layouts natively. If you choose to use an multi-dimensional array that supports both layouts, it obviously makes sense to use that functionality, because then you can just access things in the natural index order and not worry about what the underlying layout is.
Sure, some people in C++ use Just to be clear, what I've proposed is an addition to the API which is likely going to be used only by a select group of users, e.g., people dealing with generic visualization tools, or people like me, who use column major arrays in C++ and like them to be represented correctly in their output. Most people will be just fine doing Fortran and column major, and won't see any difference. |
You got the idea. The way it's written, you get 10x20 col-major data. If one didn't do the It's really just an extension of current behavior: If the data layout you're reading into is the same that the data was written with, you get that data layout with the original dims. If you're reading the data into the opposite layout, the dims are reversed, the data is the same. The difference is only that instead of assuming "C++/Python -> layout must be row-major", you get a chance to tell adios2 what it actually is.
I think allowing for different layouts at the block level is overengineering at best, and trying to implement that correctly would make my head spin. But I don't see any possible application where this could be beneficial, either. So the data layout should be a property of a given variable, not of its block. I think I said that, by the way, when you added an API that provides IsReverseDims on the block level, I can't come up with any use case where it wouldn't be the same for all blocks of a given variable. So changing the layout in the middle of a write would be an error, ideally one which gets caught, but at least undefined behavior. Note that this is no different then changing other properties of a variable in the middle of a write (say, you wrote one block to a variable, and then you called |
Let's clarify, adios2 needs to know which is the fastest and the slowest index adopting language convention, that's all adios2 asks. Ultimately, it becomes either calling something like current:
or:
if that's the case, then it's easier (and less costly) to place this functionality in the workflow of the selected group of users instead of adding more APIs in adios2. The end result is the same, we can help providing the required support. |
I think it's important to check with different communities and their execution plans for the future and current architectures. This is why I was curious about XGC plans. Are they planning to use NVLink on Summit? Can some blocks be written in Fortran and some from the host C++ code running with CUDA/hip device code? (these are questions, I am not assuming much at this stage). There are new things that would come out from the learning process of working with apps. |
"Lying" was a bit tongue-in-cheek. But the point is, I'm giving a shape to adios2 which isn't the actual true shape, and adios2 has no way of knowing that I did that.
Sure, in terms of the length of the code, the two are about the same. In terms of looking at the code and understanding what it does, the second one is definitely much clearer (there's a coding principle saying that code gets written once, but read many times, so one should keep that in mind). But those are subjective. The main point is, if I reversed the dims, adios2 doesn't have that info. So if someone else reads the data / visualizes the data / loads it into python, they end up with the reversed dims, not the original ones.
See above: If you don't give the information to adios2, there is no way that it can be stored and retrieved back. In terms of maintenance cost, I think there's a pretty good chance that implementing this feature would actually lead to a reduction in lines of code, since reversing dims or not would be decided and done at the core level, so it wouldn't need to be done by each engine, removing duplicated logic. (However, getting there wouldn't be entirely trivial.) |
I'm not sure what you mean? I don't think there are any plans beyond some GPU-aware MPI (GPUdirect) in terms of communication. If you mean GPUdirect Storage, no, I don't think anyone is thinking about that, though I guess it's something that you guys could be thinking about, because that's hopefully something that applications won't have to deal with themselves. In terms of writing some blocks from Fortran and some from C++, I could see that happening theoretically in some apps, but I wouldn't anticipate it for XGC. Besides, concurrent use of the C++ and Fortran API isn't supported by adios2, anyway. If one wanted to go really fancy, I could see some case for having different data layouts in Fortran vs CUDA code, though not so much about row-major vs col-major, but rather about AOS vs SOA. Which I think might have some interesting opportunities, but that's beyond this topic. |
I recently put VTK reader for adios2 image data and unstructured grids, and this info was never needed (HDF5 readers would never work), 'cause I am not assuming that A is A(x,y,z), this is purely application dependent (in fact, it could be misleading as explained in previous messages). The reader only needs to know the fastest and slowest index info and map this to VTK's "x" (fast), "y", and "z" (slow).
Check the number of HDF5 API calls. By maintenance costs I mean: documentation, tutorials, bug fixes, testing, CI, apps integration, issue tickets....lines of code is hardly an indicator of cost. The current APIs already provide a solution we can ship.
Going from Titan to Summit (and Frontier) would require new strategies in the computation and data placement. We still need to study these systems and their policies based on what apps need and we can/can't do before making decisions. I am collecting this info from other communities as well so we can add real value to apps, that's all. |
So I guess I don't know what adios2 image data is (maybe I missed it, or is just 2d arrays?). Unstructured grids are usually 1-d lists of nodes and associated data, aren't they? So the index order doesn't really apply. Generic HDF5 readers work most of the time, but not always, because they actually tend to assume that while HDF5 says the data is row-major, the actual data is column major so they un-reverse the dims, assuming they'd be reversed when writing. That works in the vast majority cases, because almost all scientific data tends to be in column-major order, and so the viz app's guess that what's really in the file is column-major even though it says otherwise tends to be a good guess. But you'll find plenty of people that are unhappy that they write their HDF5 file according to the standard but Paraview transposes the data for them because it thinks it knows better. The point is that Paraview has to make a guess, because HDF5 has no way to recover the information. Adios2 is doing better in that it stores whether the data has been written by Fortran or C, but it gets things wrong if one wants to write column-major from C/C++. Which, again, is a really common case, of Fortran's importance in HPC, so people moving to other languages tend to keep the data layout the same so that the data can be shared between the languages. Which is exactly the reason why you may see You don't have to take it from me, I'll quote this part from @berkgevici from #1553:
Of course, not everything is x,y,z. Say, people may use 3d arrays indexed by (longitude,latitude,height). Even in this case, though, it would be nice if the data after reading it in a generic tool would not be indexed by (height,latitude,longitude), though visualizing it as (x,y,z) would be not optimal either way. So maybe one shouldn't be focused on getting "xyz" vs "zyx" correct, what this is about is to get the indices to be in the same order as the original data.
I don't think I suggested to follow an HDF5 API (which in all those calls still missed describing the data layout). Actually, LOC is a pretty good metric for maintenance cost, and many of the things you mention (bug fixes, testing, CI, bugs) tend to scale with LOC. I don't think one should freely add all kinds of stuff to a API for no good reason and without proper consideration. I don't think my proposed way is the only possible way to do it, but it does fill a gap which is not handled by the current API. Also, after me having the point made so many times, I think it's misleading to say "The current APIs already provide a solution we can ship." It does not provide a solution to reading data back with its original indexing preserved if one uses C/C++ to write column-major data, which, again, is a common case. Of course you can ship it anyway, and it's still usable. HDF5 has lived with this limitation forever, but why not use the opportunity and learn from others' mistakes? Having said that, there are some real issues / costs:
I didn't mean concurrent in the sense of multi-threaded or anything, I mean the Fortran bindings aren't designed in a way where you can open a file in C++ and write some blocks for a Variable, and then write some more blocks in Fortran, even in a serial program. |
This goes back to supporting the selected group of users by adding an extra attribute if that information is important to preserve. My experience is that the latest hasn't been required for visualization (knowing which index is fast/slow is sufficient and if the dims were reversed). What Berk requested was addressed by adding the info at the block level, check the last sentence from his request (after the quote you C&P) which predates that PR.
This is independent of I/O library, I can save file in VTK format and still be presented in Paraview as VTK's "x"->fastest, "y", "z"-> slowest. I don't think there is a workaround if "x" is not your fastest index (or an index at all) in your simulation as it's a design decision in VTK (just like it's a design decision in HDF5, readers just need to accommodate to VTK). I put more info in a previous response. Anyways, I'd like to hear more from other users and bring awareness of what adios2 does on the topic. You showed that your app still can use adios2 in its current state. |
You're exactly right that knowing the provided data layout and whether dims is sufficient (and necessary). However, you also are telling me that I need to reverse the dims myself, which I do, but there is no way to tell adios2 that I did, so that information is not stored and hence not available when reading the data back in generic tools. So that information you seem to agree is needed is not available. I've said that the API would primarily be used by a small group of developers, those that work on generic analysis / viz tools. However, lots of people use those tools, and they're affected when the dims end up being reversed. That includes people using the included adios2 python interface. I'll file that as a separate issue. |
Either way we need to track layout outside of adios2 (it's either a call to
Read:
The proposed changes would be:
Read
The one thing it's still an open question is if SetLayout is different from the original layout then adios2 would transpose the data. Which is expensive, but that would be my first impression when looking at the code below: Read
|
Well, if I use an API function, I'm telling adios2, so that it can keep track internally instead, that's really the major goal here.
I like code ;)
But this wouldn't work, because I did Side note: I guess you better not run this on an MPI process that happened to not write a block for this variable, because
You don't have to set the layout at read time, it'll be row-major by default (this is all a C++ API, and row-major is what's currently always assumed, so behavior is unchanged). If you do, you get the shape that matches the layout you just selected. If you select the original layout, you get the original shape, which is good if you can support it (python/ndarray can, in C++ it depends on what you use for your multi-d arrays. If you only use a multi-d array that's fixed to be row-major, you don't need write the code above, you just read it, and deal with the fact that maybe it's reversed. That's what vtk would do, it'd just read as row-major. The dims may be reversed, and that info is available from The new API, as used above, is admittedly somewhat awkward. It would be more natural to just always return the data in its original shape and layout, as should (and would) be done by the python bindings. But in the C++-API, there isn't any data model beyond using simple pointers, because there is no standardized multi-d array data structure in C++ in the first place. So this only provides the pieces that can be used to build such behavior on top of it. And it's done in a way that preserves compatibility with current behavior.
No, the data would never be transposed, only the dims get reversed, just like is currently the case (that's my proposal, anyway, one could transpose the data, but as you say, that's expensive, so it's usually avoided). |
Kai, let me try to recap what you are proposing here:
What happens if a reader sets the layout to the opposite layout in the same language? Do we need to transpose the data, or assume that the user reverses the dimensions in the read selection for the proper reading, or throw an exception? Probably you said so already, to go for the second option, right? |
How about an extension to the API with new This way we could have an unmutable property of the Variable object on both sides. Majority of the codes can use the original functions, while Fortran/C++ mixed codes and others could declare the order. Same effect as your On the read size, I don't think I added or removed anything from your original proposal, so correct me if I misunderstood what you have been proposing. |
Basically, it is about keeping the current approach of:
The primary difference is in allowing C++ (and hence python) to state what data layout they have/want, rather than assuming it must be row-major. BP3/BP4 already have a flag that describes whether data is row-major/col-major. Eventually, in future formats, it'd be helpful to support having that flag be per-variable, rather than (somewhat) global (it's per PG right now, I believe). The invariant is that the data is never reordered.
CXX
The most interesting case: Data has been written (probably by fortran) as col-major. It's originally 5x10. On disk it's recorded as "col-major 5x10". Currently, in C++, we see But if you're using a C++ multi-d array class that supports flexible data layouts, or python, you can say // var.Shape() == {10,5} as of now
layout = var.GetOriginalLayout(); // returns ColumnMajor
if (layout == ColumnMajor) {
var.SetLayout(ColumnMajor);
// var.Shape returns {5, 10} now
auto field = Array<ColMajor>(var.Shape());
var.Get(field.data()); // read it
// can use field(i, j) to access things in original order
} else {
auto field = Array<RowMajor>(var.Shape());
var.Get(field.data()); // read it
// can use field(i, j) to access things in original order
} |
Sorry for the cross-over, I just wrote the reply above before reading your subsequent comment. You're right, auto var = io.InquireVariable<T>("abc", Layout::Original); which would also allow the flexibility to specify, e.g., I like an enum class because of future extensibility. For example, when using the hdf5 engine, it could return While I'm not too concerned about someone calling |
In C++ I'd suggest to follow boost, Eigen in case this ever becomes standard in C++ (makes life easier to explain to C++ users) as an extra template parameter: so something along the lines of what @pnorbert suggested:
order is a extendable type, so far the same as numpy:
|
@williamfgc I am fine with that. I was just trying to move the additional functionality into the Define and Inquire steps to have an immutable flag but did not know how to do it right in C++. But can we keep the original DefineVariable format as well? |
@pnorbert yes, we'll have to assume defaults not to break current APIs. I'd have to experiment a bit before confirming this 100% applying partial template specialization.
|
As far as |
Code extracted from: https://github.com/pybind/pybind11.git at commit 80d452484c5409444b0ec19383faa84bb7a4d351 (v2.4.3). Upstream Shortlog ----------------- Ahuva Kroizer (1): 8f5b7fce FAQ addition (ornladios#1606) Alexander Gagarin (2): 0071a3fe Fix async Python functors invoking from multiple C++ threads (ornladios#1587) (ornladios#1595) b3bf248e Fix casting of time points with non-system-clock duration with VS (ornladios#1748) Allan Leal (1): e76dff77 Fix for Issue ornladios#1258 (ornladios#1298) Andre Schmeißer (1): 19189b4c Make `overload_cast_impl` available in C++11 mode. (ornladios#1581) Ansgar Burchardt (1): a22dd2d1 correct stride in matrix example and test Antony Lee (6): a303c6fc Remove spurious quote in error message. (ornladios#1202) 0826b3c1 Add spaces around "=" in signature repr. 8fbb5594 Clarify error_already_set documentation. 55dc1319 Clarify docs for functions taking bytes and not str. 58e551cc Properly report exceptions thrown during module initialization. baf6b990 Silence GCC8's -Wcast-function-type. (ornladios#1396) Axel Huebl (9): 4b84bad7 Fix Travis GCC 7 Python 3.6.6 (ornladios#1436) 97b20e53 CMake: Remember Python Version (ornladios#1434) 3a94561c Debug Builds: -DPy_DEBUG (ornladios#1438) 435dbdd1 add_module: allow include as SYSTEM (ornladios#1416) 9424d5d2 type_record: Uninit Member (ornladios#1658) 1c627c9e pybind11_getbuffer: useless safe nullptr check (ornladios#1664) a2cdd0b9 dict_readonly: member init (ornladios#1661) 000aabb2 Test: Numpy Scalar Creation (ornladios#1530) 38f408fc value_and_holder: uninit members (ornladios#1660) Baljak (1): 81da9888 Fix Intel C++ compiler warning on Windows (ornladios#1608) Blake Thompson (1): 30c03523 Added __contains__ to stl bindings for maps (ornladios#1767) Boris Dalstein (2): b30734ee Fix typo in doc: build-in -> built-in 96be2c15 Fix version mismatch typos in .travis.yml (ornladios#1948) Boris Staletic (4): 289e5d9c Implement an enum_ property "name" f4b4e2e9 Use new Doxygen archive URL - fixes Travis 0ca6867e Avoid Visual Studio 2017 15.9.4 ICE b3f0b4de new sphinx (ornladios#1786) Borja Zarco (2): e2b884c3 Use `PyGILState_GetThisThreadState` when using gil_scoped_acquire. (ornladios#1211) b2fdfd12 Avoid use of lambda to work around a clang bug. (ornladios#1883) Bruce Merry (2): 1e6172d4 Fix some minor mistakes in comments on struct instance 3b265787 Document using atexit for module destructors on PyPy (ornladios#1169) Chris Rusby (1): 22859bb8 Support more natural syntax for vector extend Christoph Kahl (1): 640b8fe6 fix ornladios#1406 add mingw compatibility (ornladios#1851) Dan (10): a175b21e Avoid decoding already-decoded strings from cindex. 4612db54 Try to autodetect the location of the clang standard libraries. b46bb64d Allow user to override default values of -x and -std=. a33212df Wrap the main functionality of mkdoc in a function. ede328a7 Allow writing output to file instead of stdout. a163f881 Delete partially-written file in the event of an error. 590e7ace Avoid storing global state. 2c8c5c4e Split into seperate functions for easier invocation from python. e0b8bbbc Use a file-local constant for non-prefixing nodes. 41f29ccd Parse command-line args in a separate function. Darius Arnold (1): 09330b94 Fix typos in documentation (ornladios#1635) David Caron (1): 307ea6b7 Typo Davis E. King (1): 9343e68b Fix cmake scripts so projects using CUDA .cu files build correctly. (ornladios#1441) Dean Moldovan (3): c10ac6cf Make it possible to generate constexpr signatures in C++11 mode 56613945 Use semi-constexpr signatures on MSVC 0aef6422 Simplify function signature annotation and parsing Dennis Luxen (1): 221fb1e1 Untangle cast logic to not implicitly require castability (ornladios#1442) Dmitry (1): 8f5a8ab4 Don't strip debug symbols in RelWithDebInfo mode (ornladios#1892) Elliott Sales de Andrade (1): 5e7591c6 Update PyPI URLs. Eric Cousineau (2): e9ca89f4 numpy: Add test for explicit dtype checks. At present, int64 + uint64 do not exactly match dtype(...).num 4a3464fd numpy: Provide concrete size aliases Francesco Biscani (1): ba33b2fc Add -Wdeprecated to test suite and fix associated warnings (ornladios#1191) François Becker (1): ce9d6e2c Fixed typo in classes.rst (ornladios#1388) Guilhem Saurel (2): e7ef34f2 compatibility with pytest 4.0, fix ornladios#1670 43a39bc7 ignore numpy.ufunc size warnings Henry Schreiner (10): 04b41f03 Upgrading to Xcode 9 & fix OSX/Py3 build failure cf0d0f9d Matching Python 2 int behavior on Python 2 (ornladios#1186) 6c62d279 Fix for conda failures on Windows ffd56ebe Fix pip issues on AppVeyor CI (ornladios#1369) 3789b4f9 Update C++ macros for C++17 and MSVC Z mode (ornladios#1347) 0f404a5d Allow recursive checkout without https (ornladios#1563) ae951ca0 CI fixes (ornladios#1744) 73b840dc Fix for issue in latest conda (ornladios#1757) 9bb33131 Fixing warnings about conversions in GCC 7+ (ornladios#1753) 047ce8c4 Fix iostream when used with nogil (ornladios#1368) Ian Bell (1): 502ffe50 Add docs and tests for unary op on class (ornladios#1814) Igor Socec (1): a301c5ad Dtype field ordering for NumPy 1.14 (ornladios#1837) Ivan Smirnov (1): d1db2ccf Make register_dtype() accept any field containers (ornladios#1225) Ivor Wanders (1): 2b045757 Improve documentation related to inheritance. (ornladios#1676) Jamie Snape (1): a0b8f70d Ensure PythonLibsNew_FOUND is set in FindPythonLibsNew module (ornladios#1373) Jason Rhinelander (31): c6a57c10 Fix dtype string leak d2757d04 Remove superfluous "requires_numpy" 64a99b92 Specify minimum needed cmake version in test suite 1b08df58 Fix `char &` arguments being non-bindable 7672292e Add informative compilation failure for method_adaptor failures 6a81dbbb Fix 2D Nx1/1xN inputs to eigen dense vector args a582d6c7 Build /permissive- under VS2017 835fa9bc Miscellaneous travis-ci updates/fixes 32ef69ac Qualify `cast_op_type` to help ICC 5c7a290d Fix new flake8 E741 error from using `l` variable 71178922 __qualname__ and nested class naming fixes (ornladios#1171) 086d53e8 Clean up eigen download code (and bump to 3.3.4) 3be401f2 Silence new MSVC C++17 deprecation warnings 48e1f9aa Fix premature destruction of args/kwargs arguments 367d723a Simplify arg copying b48d4a01 Added py::args ref counting tests 88efb251 Fixes for numpy 1.14.0 compatibility 507da418 Use a named rather than anon struct in instance 326deef2 Fix segfault when reloading interpreter with external modules (ornladios#1092) adbc8111 Use stricter brace initialization 657a51e8 Remove unnecessary `detail::` add56ccd MSVC workaround for broken `using detail::_` warning 431fc0e1 Fix numpy dtypes test on big-endian architectures 1ddfacba Fix for Python3 via brew e88656ab Improve macro type handling for types with commas 9f41c8ea Fix class name in overload failure message 6d0b4708 Reimplement version check and combine init macros 6862cb9b Add workaround for clang 3.3/3.4 e763f046 Base class destructor should be virtual f7bc18f5 Fix compatibility with catch v2 177713fa Fix gcc-8 compilation warning Jeff VanOss (3): 05d379a9 fix return from std::map bindings to __delitem__ (ornladios#1229) 01839dce remove duplicate feature from list (ornladios#1476) 77ef03d5 compile time check that properties have no py:arg values (ornladios#1524) Jeffrey Quesnelle (1): f93cd0aa PYBIND11_TLS_REPLACE_VALUE should use macro argument value in Python 3.7+ (ornladios#1683) Jeremy Maitin-Shepard (1): a3f4a0e8 Add support for __await__, __aiter__, and __anext__ protocols (ornladios#1842) Josh Kelley (1): 741576dd Update documentation for initialize_interpreter (ornladios#1584) Justin Bassett (1): 2cbafb05 fix detail::pythonbuf::overflow()'s return value to return not_eof(c) (ornladios#1479) Jörg Kreuzberger (1): 69dc380c ornladios#1208 Handle forced unwind exception (e.g. during pthread termination) Karl Haubenwallner (1): e9d6e879 Added a debug flag to the PYBIND11_INTERNALS_VERSION (ornladios#1549) Khachajantc Michael (1): e3cb2a67 Use std::addressof to obtain holder address instead of operator& Krzysztof Fornalczyk (1): 5c8746ff check for already existing enum value added; added test (ornladios#1453) Lori A. Burns (3): bdbe8d0b Enforces intel icpc >= 2017, fixes ornladios#1121 (ornladios#1363) 868d94fc Apply c++ standard flag only to files of CXX language. (ornladios#1678) f6c4c104 restores __invert__ to arithmetic-enabled enum, fixes ornladios#1907 (ornladios#1909) Maciek Starzyk (1): 9b028562 Update PyPI URLs Manuel Schneider (1): 492da592 another typo (ornladios#1675) Marc Schlaich (1): ab003dbd Correct VS version in FAQ Matthias Geier (1): 7bb1da96 fix copy-paste error: non-const -> const Michael Goulding (1): 77374a7e VS 15.8.0 Preview 4.0 has a bug with alias templates (ornladios#1462) Michał Wawrzyniec Urbańczyk (1): 978d439e Add PYBIND11_ prefix to the THROW macro to prevent name collisions. (ornladios#1578) Naotoshi Seo (1): 5ef1af13 Fix SEGV to create empty shaped numpy array (ornladios#1371) Nathan (1): 9b3fb053 Allow Windows.h min/max to coexist with pybind11 (ornladios#1847) Omar Awile (2): ac6cb91a Fixed small typo (ornladios#1633) 95f750a8 Add optional buffer size to pythonbuf::d_buffer constructor (ornladios#1687) Patrik Huber (1): 41a4fd8a Fix missing word typo Pauli Virtanen (1): c9d32a81 numpy: fix refcount leak to dtype singleton (ornladios#1860) Roland Dreier (2): 7a24bcf1 Fix malformed reST (ornladios#1802) 1aa8dd17 Fix assertion failure for unions (ornladios#1685) (ornladios#1709) Rune Paamand (2): 73634b6d Update iostream.h: Changed a local varname 'self' to 'self_' (ornladios#1535) 06d021b6 Issue ornladios#1532: Incompatible config options, /MP vs /Gm for MSVC in DEBUG (ornladios#1533) Ryota Suzuki (1): 1377fbf7 Fix unintentional escaping of character on Windows (ornladios#1574) (ornladios#1575) Samuel Debionne (2): 87fa6a43 Detect whether we are running in a Conda environment and adjust get_include() (ornladios#1877) 6ca312b3 Avoid infinite recursion in is_copy_constructible (ornladios#1910) Saran Tunyasuvunakool (2): b60fd233 Make sure `detail::get_internals` acquires the GIL before making Python calls. (ornladios#1836) bdf1a2cc In internals.h, only look at _DEBUG when compiling with MSVC. (ornladios#1855) Semen Yesylevskyy (1): ef13fb2e Info about inconsistent detection of Python version between pybind11 … (ornladios#1093) Sergei Izmailov (3): 979d75de doc: Add note about casting from `None` to `T*` (ornladios#1760) 09f08294 Avoid conversion to `int_` rhs argument of enum eq/ne (ornladios#1912) 6cb584e9 Adapt to python3.8 C API change (ornladios#1950) Sergei Lebedev (2): 08b0bda4 Added set::contains and generalized dict::contains (ornladios#1884) 046267c6 Added .empty() to all collection types (ornladios#1887) Stephen Larew (1): 5b4751af Add const to buffer:request() (ornladios#1890) Steven Johnson (1): 4ddf7c40 Add missing includes for better Bazel compatibility (ornladios#1255) Tarcísio Fischer (1): 54eb8193 Fix scoped enums comparison for equal/not equal cases (ornladios#1339) (ornladios#1571) Ted Drain (1): 0a0758ce Added write only property functions for issue ornladios#1142 (ornladios#1144) Thomas Hrabe (1): 534b756c Minor documentation clarification in numpy.rst (ornladios#1356) Thomas Peters (1): dffe869d quiet clang warning by adding default move ctor (ornladios#1821) Tom de Geus (1): a7ff616d Simplified example allowing more robust usage, fixed minor spelling issues Tomas Babej (1): 01fada76 Minor typo Toru Niina (1): 74d335a5 Replace a usage of C++14 language features with C++11 code (ornladios#1833) Trevor Laughlin (1): 63c2a972 Enable unique_ptr holder with mixed Deleters between base and derived types (ornladios#1353) Unknown (1): 0b3f44eb Trivial typos Vladimír Vondruš (2): 5b0ea77c Fix -Wmissing-prototypes warning on Clang. (ornladios#1863) 04c8f4b5 Expose BufferError among other pybind11 exceptions. (ornladios#1852) Wenzel Jakob (51): f94d7598 updated changelog for v2.2.1 release 6d19036c support docstrings in enum::value() (ornladios#1160) e7d304fb added citation reference (fixes ornladios#767) (ornladios#1189) 15e0e445 Moved section on licensing of contributions (fixes ornladios#1109) (ornladios#1188) ff6bd092 Fix pybind11 interoperability with Clang trunk (ornladios#1269) 2d0507db added v2.2.2 changelog 060936fe Detect pybind11 header path without depending on pip internals (fixes ornladios#1174) (ornladios#1190) ed670055 Minor fix for MSVC warning CS4459 (ornladios#1374) f5f66189 updated changelog for v2.2.3 cbd16a82 stl.h: propagate return value policies to type-specific casters (ornladios#1455) d4b37a28 added py::ellipsis() method for slicing of multidimensional NumPy arrays 885b5b90 Eigen test suite: don't create a np.matrix e0f3a766 Fixed flake8 error in test_iostream.py 44e39e0d fix regression reported by @cyfdecyf in ornladios#1454 (ornladios#1517) 35c82c72 changelog for version 2.2.4 & features targeted for 2.3.0 06710020 object_api: support the number protocol b4b22924 relax operator[] for tuples, lists, and sequences f4245181 enum_: move most functionality to a non-template implementation c8e9f3cc quench __setstate__ warnings (fixes ornladios#1522) c9b8933e flake8 fix 9f73060c std::array<> caster: support arbitrary sequences (ornladios#1602) adc2cdd5 fixed regression in STL type caster RVPs (fixes ornladios#1561) (ornladios#1603) e2eca4f8 Support C++17 aligned new statement (ornladios#1582) cea42467 fix py::cast<void *> (ornladios#1605) 64205140 added std::deque to overview.rst d1f64fa9 AppVeyor: quench pip deprecation warnings for v2.7 ccbe68b0 added binding delattr() -> PyObject_DelAttr analogous to hasattr() 25abf7ef flake8 fixes e11e71d8 Make compiler flags for -Werror specific to GNU, Clang, or Intel 51ca6b08 Update docs on std::out_of_range exception mapping (ornladios#1254) cf36e3d9 updated changelog 64f2a5f8 begin work on v2.3.1 ed39c504 README.md: added several folks who've made repeated contributions a1b71df1 fix issue ornladios#1804 (warning about redefined macros) 8b90b1da error_already_set: acquire GIL one line earlier (fixes ornladios#1779) 9fd47121 fix test suite (pytest changes in ExceptionInfo class) b2c4ff60 renamed local gil_scoped_acquire to gil_scoped_acquire_local to avoid ambiguity c9f5a464 pybind11 internals: separate different compilers 00a0aa99 v2.4.0 release e825205a begin working on v2.4.1 21d0eb46 Fix Python 3.8 test regression 5fd187eb minor changelog cleanup 31680e6f Implicit conversion from enum to int for Python 3.8 (fix by @sizmailov) e44fcc3c v2.4.1 release 82cf7935 begin working on next version f3109d84 future-proof Python version check from commit 31680e6 (@lgritz) 7f5dad7d Remove usage of C++14 constructs (fixes ornladios#1929) 7ec2ddfc v2.4.2 release 2abd7e1e updated release.rst to remove parts that are now automated 34c2281e begin working on next version 80d45248 v2.4.3 release Yannick Jadoul (4): b4719a60 Switching deprecated Thread Local Storage (TLS) usage in Python 3.7 to Thread Specific Storage (TSS) (ornladios#1454) 085a2943 Increasing timeout in test_gil_scoped.py to get AppVeyor to succeed 97784dad [BUGFIX] Fixing pybind11::error_already_set.matches to also work with exception subclasses (ornladios#1715) d23c821b Make static member functions, added with `def_static`, `staticmethod` descriptor instances (ornladios#1732) Zach DeVito (1): 03874e37 Fix leak in var arg handling ali-beep (1): 5ef13eb6 Add negative indexing support to stl_bind. (ornladios#1882) cdyson37 (1): 111b25b2 Mention flake8 and check-style.sh in CONTRIBUTING (ornladios#1567) kingofpayne (1): 12e8774b Added support for list insertion. (ornladios#1888) luz.paz (2): 28cb6764 misc. typos 13c08072 Typo luzpaz (2): 4b874616 Misc. typos (ornladios#1384) 21bf16f5 misc. comment typo (ornladios#1629) martinRenou (1): 35045eee Add getters for exception type, value and traceback (ornladios#1641) nstelzen (1): c2514340 Added note in documentation regarding make install (ornladios#1801) oremanj (2): fd9bc8f5 Add basic support for tag-based static polymorphism (ornladios#1326) e7761e33 Fix potential crash when calling an overloaded function (ornladios#1327) phil-zxx (1): c6b699d9 Added ability to convert from datetime.date to system_clock::time_point (ornladios#1848) sizmailov (1): 21c3911b add signed overload for `py::slice::compute` voxmea (1): 17983e74 Adds type_caster support for std::deque. (ornladios#1609)
This is somewhat BP-X related, but also a more generic API issue, so I figured I'll put the details here.
Language-independent handling of row-major vs col-major data layout
Here's a proposal to increase flexibility in handling data layout while keeping the existing behavior / APIs unchanged.
Rationale
Currently, the ADIOS2 API is built around the assumption that the language of the API bindings determines the data layout that the application is using, in particular Fortran/Matlab/R: column-major; otherwise: row-major (see
adiosSystem.cpp
). That covers most common applications. In particular reading and writing data from the same language works nicely. However, mixing languages or data layouts is more complicated.When Fortran writes, say, a 400x200 (column-major) array, it will be presented in the C/C++ API as a 200x400 row-major array. In order to present the data as a row-major array, there are only two options: reverse the shape, or transpose the data. Since transposition is potentially expensive, ADIOS2 make the perfectly reasonable choice to instead reverse the dimensions.
However, the assumption that the language of the API determines the data layout used is not always valid. For example:
The main part of a legacy code remains written in Fortran, hence it's using column-major data layout for its arrays. However, some parts of the code are rewritten in C++, e.g., in order to enable use of CUDA / Kokkos (this is currently happening for XGC). At this point, if one writes data using the adios2 C++ API, it makes the assumption that the data layout is row-major, even though it actually is not. This can be worked around on the application side by pretending that the column-data is row-major if the app reverses dims itself. However, one loses the information whether a dataset is actually row-major, or just pretending to be.
Essentially the same issue happens in a pure C++ code that uses data structures that aren't (necessarily) row-major, like, e.g., xtensor or Kokkos Views. Again, it can be worked around by reversing dims on the application side, but the info what the real layout is gets lost.
A general visualization app, say written in C++, sees a data set of size (200x400). It doesn't know whether that means there are really 200 points in x direction and 400 points in y direction, or whether it is looking at an actual 400x200 data set that had been written using the Fortran API. This is somewhat solved for the unique language->layout mapping case by Exposes IsReverseDims in BlockInfo #1573, though (a) the app has to go down to the block level to check the
IsReverseDims
flag and (b) theIsReverseDims
flag doesn't directly tell the original data layout, but requires additional logic from the application. The viz app then has to reverse dims itself again to read data into a data structure using the original layout.The visualization app still has no way of knowing whether the writing app had reversed the dims itself because it falls into either of the cases above.
How relevant is this?
XGC is going to a mixed Fortran / C++ model, where C++ is using col-major data structures. For now, writing the data is still handled in Fortran, so there is no actual issue yet.
My particle-in-cell code (PSC) falls into the category of a C++ code using column-major data layout (for historic reasons), so I currently I'm passing fake (reversed) dims to ADIOS2 to make things work. (Actually, the same has to be done when using HDF5, but I think this is really a weakness of HDF5 which shouldn't be perpetuated.)
@berkgeveci supported the usefulness of knowing row-major vs col-major in row/column major considerations #1553.
Proposed data format changes
BP3 / BP4 have global flags that indicate the host language, ie., assumed data layout. That's helpful, but having per-dataset granularity would be better. Per-dataset granualarity would be better, so one could write both row-major and col-major dataset into the same file. The API changes I'm proposing don't necessarily require changes to the data format, which might well be not possible at this point for compatibility reasons, but I think it'd be good to consider this for future evolutions of the data formats. One could use attributes to store the information as attributes given that existing mechanism, but I think it's a bit iffy since there is no distinction between user-defined and system-defined attributes, so there could be
conflicts if one were to do that.)
The existing BP3 / BP4 would support the additional API described below, but they would not support mixing row-major and col-major datasets in a single file.
Proposed API
(I'm just making this up as I'm writing this. But I hope it'll give a useful basis for thinking about it.)
Each dataset (
Variable
) would have aLayout
property associated with it. When creating/inquiring variable for writing/reading, theLayout
property would be set to the current host language.Add a
Variable::setLayout()
function that would allow the application to override the default layout. E.g., in C++ I could go saymyVar.setLayout(Layout::ColumnMajor)
to tell adios2 that what I'm passing to it will be laid out in ColumMajor. For now, adios2 would just revere the dims for me, writing the data while pretending it's the row-major layout that it's expecting. Ideally, adios2 would also record the original data layout, but that's data format dependent(see above).
When reading, the same mechanism would apply. If I know the data I'll be reading is actually column-major, and my data structure expects the blob of data to be in column-major, I'll do a
myVar.SetLayout(Layout::ColumnMajor)
first. If the actual data is not already columnn-major, adios2 would reverse dims for me (basically just like what currently happens when file language and API host language don't match, but it'd be based on the layout I'm telling, not the language I'm using, though in the default case nothing would change.)Finally, add a
Variable::GetOriginalLayout()
function, which would for a given variable tell me what the original layout was when it was written. So a dataset written by Fortran would return "ColumnMajor", while a dataset written in C++ would return "RowMajor". That way, an application that uses a data structure that supports both row-major and col-major could load the data back into the appropriate data structure. without having to deal with either reversing dims or transposing data. Something like this:The text was updated successfully, but these errors were encountered: