Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error accessing OxfordHDF5 files #164

Open
Philip-Go opened this issue Oct 21, 2022 · 28 comments
Open

Error accessing OxfordHDF5 files #164

Philip-Go opened this issue Oct 21, 2022 · 28 comments

Comments

@Philip-Go
Copy link

Philip-Go commented Oct 21, 2022

Hi everybody,

I'm using EMsoft 5.0 in a nightly-build copiled package and was using it with OxfordBinaries .ebsp from AZtec versions 3.1 to 4.3. In that combo, the DI went smoothly and resulted great mappings of nano-crystalline metals and deformed microstructures.

However, recently one of our EBSD systems got an update of the AZtec software, from 3.1 to AZtec 6.0. With this update some changes in the OxfordBinary file must have occured, because EMEBSDDIpreview can't access the .ebsp any more.
This can be related to the HDF5 file, because now the OxfordHDF5 file format .h5oina contains the uncompressed Kikuchi patternswhich are nicely accessible, for example in Python via h5py and matplotlib. The path to access them is '1/EBSD/Data/Processed Patterns'.
Unfortunately, the EMsoft inputtype OxfordHDF leads to an hard coded break. From the code I could find out that as of 2019 the OxfordHDF5 did not contain any Kikuchi patterns, so no routine was implemented.

However, now the Kikuchi patterns are there. I'm not that familiar with HDF5 yet, but I asumed that it's advantage is data accessability independent of the hardware manufacturer. So I tried to run EMEBSDDIpreview on the OxfordHDF with inputtype BrukerHDF, which failed, and TSLHDF, which then was succesfull! Of course I tried the dictionary indexing EMEBSDDI on the OxfordHDF with the TSLHDF, which started out nicely. But eventually, the programm failed during Pattern Preprocessing inbetween row 260 and 265 of 500 rows with the error:

Completed row 260 of 500 rows
Error code : -1
returned by routine HDF_readHyperslabIntegerArray3D:h5dread_f:Processed Patterns

Interestingly, when I define a ROI from 1st to 250th column, the DI programm runs smoothly and returns a proper .ctf file:

Completed row 250 of 250 rows
-> experimental patterns preprocessed
Number of experimental patterns processed per second : 34883.7
-> computing Average Dot Product map (ADP)
-> Number of threads set to 8
Oct 21 2022 11:56:12.770 AM
min./max. dot product = 0.768612 / 0.867969; 1.2% complete
min./max. dot product = 0.768987 / 0.870098; 2.4% complete
etc...

The dataset contains 375000 datapoint/kikuchi patterns in 750 columns and 500 rows, Kikuchi patterns have a size of 168x128 px.

My question is know, if someone has implemented but not released a routine for reading OxfordHDF files containing Kikuchi patterns or maybe has an idea, why the TSLHDF set only works until a specific row. I am neither experienced with HDF5 nor Fortran. But following the error-code, the error occurs in the h5read_f line in the HDF_readHyperslabIntegerArray3D procedure.

edit:
what I maybe need to mention: if I chose the ROI from row 250 to 500 (lower half), the pattern pre processing breaks after 10 lines, probably corresponding to the exact same line/pattern where processing of the whole data set fails.

Thank you for any ideas or comments

@marcdegraef
Copy link
Collaborator

marcdegraef commented Oct 21, 2022 via email

@hakonanes
Copy link
Contributor

hakonanes commented Oct 21, 2022

Hi Phillip and Marc,

we recently added a reader for Oxfords H5OINA files in kikuchipy (Python) as well (soon to be released). I only tested it on files which were not cropped to a region of interest. I assume our reader will not read a file with an ROI either, so I too would be interested in a small example H5OINA file with an ROI.

The H5OINA format specification might be helpful, including the definition of the ROI.

@Philip-Go
Copy link
Author

Hello Marc and Hakon,

thanks for your quick reply!
The following link gives you an .h5oina file of a 75 x75 EBSD mapping of an Aluminum sample:
https://faubox.rrze.uni-erlangen.de/getlink/fiTbeDDgQ8ZfSz7ocD87vh/Al_SLBM%2010kV_8x8_small.h5oina

The Kikuchi-Pattern from the center point of that map is here:
Al_SLBM - 10kV_8x8_small MapCenter - Kikuchi

The Oxford Detector Parameters are:

Collected: 21.10.2022 16:15:13
Accelerating Voltage: 10.00 kV
Working Distance: 16.0 mm
Detector Insertion Distance: 174.9 mm
Tilt Axis: Parallel to X
Specimen Tilt (degrees): 70.0°
EBSD Camera Binning Mode: 8x8 (168x128 pixels)
EBSD Camera Exposure Time: 29.24 ms
EBSD Camera Gain: 0
Frame Averaging: 1 frames
Static Background Correction: On
Auto Background Correction: On
Pattern Center: (0.563, 0.570)
Detector Distance: 0.506 = 16.04 mm
Hough Resolution: 60
Band Detection Mode: Centers
Number of Bands Detected: 6
Indexing Mode: Refined Accuracy

The Oxford Detector Parameters translate to the following EMsoft parameters (d = 25µm, binnig = 8, tilt = 5°):
xPC = -10.584
yPC = 31.76
L = 17001.6

I haven't tested this file with the inputtype='TSLHDF' flag, but asume from the behavior of the big map, that it should work - but I'll try it.

@hakonanes: the map behind the link should also be a ROI as defined in the H5OINA spec., meaning, that the EBSD mapping was performed on a smaller area than the respective electron image. However the ROI I was referring to in the opening post was the ROI defined in the EMEBSDDI.namelist file, so a rectangular subspace within the EBSD map.

Regards,
Philip

@hakonanes
Copy link
Contributor

Thank you, Philip, with your file I could confirm that the mentioned reader in kikuchipy reads your file without issue.

I tried feeding the patterns and PC values automatically read from the file directly to PyEBSDIndex for Hough indexing (PC values converted via kikuchipy's EBSD detector to Bruker's PC convention used internally), which produced results as expected (map colors according to IPF-Z):

bilde

@Philip-Go
Copy link
Author

That might be off topic regarding EMsoft, but the resulting IPf map looks well indexed. Does PyEBSDIndex achieve such a high indexation rate by hough indexing alone or was the map subjected to some cleaning and filtering afterwards? I'm wondering, since with online indexing during the measurement AZtec yields only ~60% indexation rate.
I should definately take a closer look at that, thanks showing that!

@marcdegraef
Copy link
Collaborator

marcdegraef commented Oct 21, 2022 via email

@Philip-Go
Copy link
Author

Hi Marc, the detector geometry values we measured and obtained from Oxford are (as stated above):

tilt: 5°-6°, but could be 6.5°, I haven't obtained it more accurately, but this can be within the measurment error
detector pixel size: 25 µm
binning (in this mapping): 8x

The PC detector parameters from Oxford are:
xPC(Ox) = 0.563
yPC(Ox) = 0.570
zPC(Ox) = 0.503

translating this into EMsoft definition I would get:
xPC = -10.584
yPC = 31.76
L = 17001.6 mm

If it is of any help, I could upload the large dataset as well. However this will be 7-8 Gb large.

Regards,
Philip

@marcdegraef
Copy link
Collaborator

marcdegraef commented Oct 22, 2022 via email

@marcdegraef
Copy link
Collaborator

I'm not sure why my images don't show up... is there a trick to make them show up ?

@marcdegraef
Copy link
Collaborator

marcdegraef commented Oct 22, 2022

I figured it out; it appears that my mail program (Thunderbird) does not properly format responses to this list; here are the images:

reference image: pattern

best fit pattern: bestfit

IPFZ: Al-IPFZ

@marcdegraef
Copy link
Collaborator

Here is the EMsoft .ang file; simply change the extension to .ang

Al-dp.txt

@Philip-Go
Copy link
Author

Hello Marc,

thank you very much for looking into the file. I went offline for the last 2 days, but thank you for waiting for my reply.
I performed a DI calculation with the unrefined detector parameters (xPC = -10.584, yPC = 31.76, L = 17001.6 mm) and could achieve comparable results, ploted in IPF_Z. For comparison I added a map from the plain AZtec acquisition data:
IPFZ_comparison

I will provide the larger data set, however I have to check with my collegue that I'm allowed to share this specific measurement. Otherwise I will perform another measurement on a material, that I can share.

Regarding the pattern center refinement, I have two questions:
What tool/programm do you use for this? I have read about EMFit in another issue (https://github.com/EMsoft-org/EMsoft/issues/131#issuecomment-722445087), but have no access to an IDL.
The other question regards your simulated kikuchi pattern and the marked PC. The position of the PCcross is in the left half, however your & my xPC is negative (-13.4377 & -10.584). As far as I understood, since EMsoft 5.x the definition of the x-axis changed to "positive to the left", thus negative in the right half, with the origin in the middle. Accordingly, the reference kikuchi pattern (middle position in the map (x35|y35)) from AZtec shows the pattern center on the right half:
Al_SLBM - 10kV_8x8_small MapCenter - Kikuchi
Is it simply due to another plotting convention of the kikuchi pattern? For experimental patterns from AZtec, such as here, the origin is in the upper left corner, viewing from the detector.

Thanks again and regards,
Philip

@marcdegraef
Copy link
Collaborator

Hi Philip,

that is reasonably good agreement between the two DI results; a change in pattern center will always give a light rotational difference between the two data sets. I have a derivation of that if you are interested.

For the pattern center cross, let me look into that; it is entirely possible that I drew it on the wrong side. I use the IDL efit program to do this refinement. The EMsoft EMFit program does not work as well as the IDL version which is a bit more sophisticated. We have been working on more advanced routines, but for the time being they are still in the private EMsoft repository.

Marc.

@drowenhorst-nrl
Copy link
Contributor

That might be off topic regarding EMsoft, but the resulting IPf map looks well indexed. Does PyEBSDIndex achieve such a high indexation rate by hough indexing alone or was the map subjected to some cleaning and filtering afterwards? I'm wondering, since with online indexing during the measurement AZtec yields only ~60% indexation rate. I should definately take a closer look at that, thanks showing that!

@Philip-Go thanks for providing the small dataset that @hakonanes was able to do a test run with PyEBSDIndex. I have been doing some testing across different methods, and so far PyEBSDIndex has been performing well compared to EDAX's hough, which in terms of indexing rate of the vendor's software, has always had a higher hit rate in my experience. But I had been curious about how it might compare to AZtec. Certainly not a full benchmark ... but I am quite pleased that PyEBSDIndex is doing as well as it appears above.

And I would remiss to not note, EMsoft Dictionary with refinement is still the gold standard in my testing.

Finally, in PyEBSDIndex there are also some PC refinement routines (see the doc/tutorials). I have found the best results by being able to send in a batch of patterns, ideally from different orientations, and minimizing the so-called "fit" parameter common to Hough-indexing. I am not sure if they would match the accuracy of the DI refinements, but they do have the advantage of not requiring IDL. But @marcdegraef and I might argue that learning a bit of IDL is probably good for you!

Current PyEBSDIndex does not account for any kind of variations in the PC due to scanning shifts. A lot of the mechanisms to do so are in the sub-programs, but the larger program flows have not been hooked together to enable that.

@Philip-Go
Copy link
Author

Hi Philip,

that is reasonably good agreement between the two DI results; a change in pattern center will always give a light rotational difference between the two data sets. I have a derivation of that if you are interested.

For the pattern center cross, let me look into that; it is entirely possible that I drew it on the wrong side. I use the IDL efit program to do this refinement. The EMsoft EMFit program does not work as well as the IDL version which is a bit more sophisticated. We have been working on more advanced routines, but for the time being they are still in the private EMsoft repository.

Marc.

Hi Marc,

I was expecting a slight deviation and confirmed it via the misorientation angle between your and my mapping. I already made my experiences with an entirely wrong calculated yPC-value in the beginning of this year, that lead to an apparent rotation around the horizontal axis by several degrees, something about 20-30°. It was due to an error in the translation from Oxford's definition to EMsoft definition of the PC.
The misorientation map is shown below. The yellow data points have a misorientation angle >=10° and are located at the grain boundary - so these points have been assigned to the respecive other grain. The main points show a constant misori between 1 - 3°.
MisOri_DI2DI

@drowenhorst-nrl Thanks for tip with orientation refinenment! I will definately take a look at PyEBSDIndex. Regarding IDL, learning something new is always good, but for now my conventional aproches have to fail first.

Last but not least, you can find the reason for this issue, the larger Oxford-HDF5, in this link: https://faubox.rrze.uni-erlangen.de/getlink/fiAqni34HVdNHg2vBdqejc/DI_Al-SLBM_750x500.h5oina
I'm excusing for the surface quality, as there was a large particle of dirt in the lower half of the scan, but in this case the issue is more about the file and data format than about material science.

Regards,
Philip

@drowenhorst-nrl
Copy link
Contributor

@Philip-Go @marcdegraef Just for future reference, I think the detector tilt is buried in the HDF5 1/EBSD/Header/Detector Orientation Euler. The middle euler angle is 96.67, so it looks like under the EMsoft/EDAX convention, it is 6.67°. They have a few other small rotations attached, but they are pretty minor, and I bet are more a product of the specific calibration session.

@marcdegraef
Copy link
Collaborator

I computed the Average Dot Product map for the full 750x500 data set and I got the following result:
Al-large_ADP
Note that there are several regions where the computation did not produce the expected result. This is likely due to bad data in the file, but I'll have to take a closer look at that. Interestingly, the EMsoft code continues to read the file despite numerous HDF5 errors being reported. I will try to run the Dictionary Indexing code as well, just to see what comes out. More later.
Marc.

@marcdegraef
Copy link
Collaborator

Here is the DI result (IPF-Z); it looks normal to me. I took a closer look at the HDF5 data, and for each line with a problem, the first pattern on the left of the bad portion of the row is the only one that is bad. HDFView lists the values of each pixel as ERROR. The rest of the row is ok, but the EMsoft routine apparently gives up at the first bad pattern. It is not clear to me what the reason is for the bad patterns, but I'm thinking it is some bug in the acquisition software...

Al-large-IPFZ

Here is the .ang file (change the extension to .ang)
Al-large.txt

@marcdegraef
Copy link
Collaborator

and here is the Orientation Similarity Map, clearly delineating even the smallest grains...
Al-large-dp_OSM

@drowenhorst-nrl
Copy link
Contributor

Marc -
Do you think it might have something to do with the lzf compression Oxford put on that dataset? Not sure why, but the h5py module can read those patterns where HDFView fails.

In the mean time, for the DI index, can you set the number of scan columns to 1, number scan rows to 500*750? It would be a slower read-in, but it would prevent the wonky patterns from messing up the rest of the row?

@marcdegraef
Copy link
Collaborator

marcdegraef commented Oct 27, 2022 via email

@drowenhorst-nrl
Copy link
Contributor

Marc -
My nomenclature above was not great ... my thought was to make each row just one pattern long, thus a bad pattern does not spoil the rest of the row. There would have to be a lot of messing around after to get angs/hdf5s back into their correct shape.

Can confirm that IDL will not read the full pattern array. However, if I just read in a hyperslab that does not include a bad pattern, that does work. Wonder if the f90 hdf5 program is doing something similar then, but its error behavior is different; it reads a hyperslab into memory, but when it hits a bad point, it bails out, but whatever is in memory before the error can still be processed.

Now, what is happening with h5py, and why it does not trip over itself with these bad points? I don't know. Only thought is that my version h5py is with HDF5 1.12.1 ; IDL is using 1.10.5; My installed EMsoft is linked against 1.10.7

I have attempted to attach a txt file that a bad pattern (not the first bad one in the file, just one I identified: x_indx = 311, y_indx = 375, thus should be pattern 311+375*70 = 281561). There does not appear to be any NANs, ranges from 1,255, same as the pattern before and after it. The values are different than the patterns before/after.

Some final clues:

  • HDFView says that this dataset is chunked with 1x128x168, so one pattern at a time.
  • HDFView also indicates that the HDF5 lzf compression has been applied to this dataset. I think there is nothing that needs to be done in the HDF_READ call to read in with this compression, that is handled by the HDF5 library internally.
  • With the hyperslab reading in IDL, I can confirm I can read none of the points in that chunk, but before/after is fine.
  • And the grand conspiracy! Who contributed the lzf compression to HDF5?! Why it was the h5py group!
    So I am still thinking that there is something that is wonky with the compression, and perhaps an updated HDF5 library would help?

badpat.txt

@marcdegraef
Copy link
Collaborator

I think you're probably right that this may be an issue of different HDF libraries causing problems. It does reflect negatively on Oxford for writing a potentially corrupt file... but that's not a real surprise...

I wrote a little f90 program to read several consecutive rows of patterns. I get an error message for each row that has an issue, but there is a pattern in each column; it's just that the pattern is exactly the same for consecutive bad rows, hence the vertical striations in the ADP map above inside the bad stripes. Next thing to try is to overwrite the bad entries and see if the ADP map comes out intact.

Marc.

@marcdegraef
Copy link
Collaborator

marcdegraef commented Nov 3, 2022 via email

@marcdegraef
Copy link
Collaborator

ok, that conversion works fine! we do need to switch the pats.shape[1] and [2] parameters, but other than that it works perfectly... the new ADP map is shown below. So, there is no problem with the Oxford hdf5 file; it is likely an incompatibility issue between HDF5 libraries. I am going to try to use HDF5-1.12.2 with the EMsoftOO repository; this will require a few changes to the HDFsupport module, but hopefully that won't be too difficult to implement. More later.

Marc.
map_up1_ADP

@drowenhorst-nrl
Copy link
Contributor

Especially frustrating given that HDFView reports that the compression ratio on that array is 1.0. I am not going to pretend to be an expert on compression algorithms, but I do know the one thing that does not work well is random noise. And even "low-noise" EBSD patterns like these are pretty noisy images comparatively. The extra overhead is simply not worth it. I know EDAX experimented with compression on their saved patterns, and what they found was that the extra bytes needed in the file needed to deal with the compression, and the lack of actual compression, actually made the files larger.

@Philip-Go
Copy link
Author

Hello Marc,

thank you very much for sharing the conversion script! It works smoothly and can nicely be integrated into the execution script. I have tested it over the weekend on a large sample of ~ 3 Mio. points and it worked nicely :) The preprocessing step of conversion took only about 30 min., comparatively short to the overall 36 h.

Nonetheless I'm looking forward to the integration of the h5oina read-in in EMsoftOO, as well as to EMsoftOO in general. Keep up the great work and thanks again for your help!

Philip

@marcdegraef
Copy link
Collaborator

Hi Philip and Dave,

just wanted to let you know that I managed to get the HDF5 library built with plugin support for the lzf compression library. At this point in time, I've only done this for the object oriented version of EMsoft (EMsoftOO, which is version 6.0, currently in beta). I can now read the Oxford *.h5oina format without any errors; that means conversion to .up1 or .up2 format is not necessary. It will take me a while to propagate those changes to EMsoft v.5 but I'm hoping to do that over the next couple of weeks.

Regards, Marc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants