Releases: yambo-code/yambo
5.3
Yambo version 5.3 is now ready for production.
This release brings significant advancements, particularly in GPU support, thanks to the contributions of the MaX Centre of Excellence. Below, you will find a comprehensive overview of the updates and improvements introduced in this version:
GPU porting
- Yambo interface with devxlib library has been significantly developed. All internal "device subroutine" have been removed from the main source. Yambo 5.3 fully relies on devxlib subroutines (version 0.8.5). This allows to compile yambo with different paradigms. While the production GPU porting strategy remains based on Cuda Fortran, now it is also possible to compile and run yambo with OpenAcc and OpenMP-GPU.
- The configuration syntax for GPU compilation has changed. Please review the updated documentation for details.
- The Slepc solver now supports GPUs. To utilize this, both Slepc and Petsc must be compiled with GPU support. (Thanks to the slepc team).
GW
- MPA / RIM-W implementation improved. @alberto-guandalini @DarioALeonValido can anyone write the status here
BSE
- The computation of the BSE kernel is now divided into three distinct steps: exchange, ALDA, and correlation. This allows for independent cut-off settings for each kernel.
- Fixed handling of alpha and epsilon, and Lbar vs Lfull, for lower dimensional systems
- New tddft kernel implementation, which is an order of magnitude faster than previous version
- New implementation of slepc solver with coupling, which takes into account of the pseudo-hermitian structure of the BSE matrix. (Thanks to the slepc team)
- Dichroism in molecule can now be computed beyond the independent particle approximation (IP) (see https://arxiv.org/abs/2202.12702 for details)
Non-linear optics
- New approximated collisions with only the local part of the exchange Lsex
https://wiki.yambo-code.eu/wiki/index.php?title=A_fast_approach_to_excitonic_effects_in_linear/non-linear_response - Collisions compression (Reduced memory requirements for collision calculations.)
- New PHHg external field
More
- The Yambo driver is now reintegrated into the main source, simplifying the compilation process.
- The configure script has been upgraded to support the latest Intel compilers (2025).
- a2y interface improved. Fixed compatibility with AFM symmetries.
- p2y interface, small fix for l= components of the pseudo-potential (Thanks to Murali)
- memory.h renamed y_memory.h to improve compatibility across architectures.
- internal libraries versions updated
- Rewritten G_m_G subroutine for better efficiency (now converts g-vectors into integers).
- Reorganized dipole subroutines.
Yambo 5.2
Yambo version 5.2 is now ready for production
This new version include many stabilization, further refining the code structure and modularization, and improving performance and memory usage. Yambo 5.2 is a production version:
Below is a detailed list of changes:
GW
- multi-plasmon pole approximation coded.
For details see: Phys. Rev. B 104, 115157 (2021)
BSE
- Improved IO performances using chunking in HDF5
(This enhancement of the performances will be extended also to other databases)
Dipoles
- When computing overlaps, needed for covariant dipoles, wave-functions are now loaded in pair of k-points.
This significantly reduced the memory usage.
Real time/Non-linear
- Improved IO performances for collisions using chunking in HDF5
- Strong memory usage reduction in non-linear optics calculations yambo_nl
Fixes
- Fixed different compilation issues with nvfortran
- Fixed bug in yambo_nl with openMP parallelization
- Fixed compilation with intel2023
Other changes
- Eigenvalues Self-consistent GW (evGW) removed from fortran code. It can be performed via concatenated runs (see evGW wiki )
- Ypp can now plot spin and magnetization factors
- Automatic configuration of HDF5 libraries
Yambo 5.1
Yambo version 5.1 is now ready for production
Yambo 5.0 was released 14 months ago, making almost all features developed within the Yambo code fully available. This includes a whole bunch of new capabilities which were made possible by the support of the MaX project. With version 5.1 the testing and the stabilization was finalized, further refining the code structure and modularization, and improving performance and memory footprint. Yambo 5.1 is a production version, but it also includes new features.
Below is a detailed list of changes:
External Libraries
- Libxc: from 2.2.3 to 5.1.5. New libxc interfaces adopted. Now yambo is able to link recent external libxc libraries
- Petsc updated to version 3.14.6
- Slepc updated to version 3.14.2
- Yambo reads and re-construct all the info on the pseudo-potential via the QE_pseudo library
Coulomb cutoff
- Added new Coulomb cutoff technique ("slab z") for for lower dimensional materials . The "slab z" cutoff has an analytical expression and is compatible with the rim_cut runlevel, for the analytical integration around the G=0 point of the coulomb interaction
- Added rimW integration of the screened interaction obtained via interpolation functions. Together with the RIM, this makes possible to significantly speed up the convergence of GW and BSE simulations with respect to the k-point sampling. To be used with the "slab z" cutoff and for 2D materials.
Screening
- Improved handling of anisotropy. For the q → 0 limit, now an average over three cartesian direction can be selected in input
Yambo can now compute screening without SOC and later use it in a calculation with SOC
Dipoles
- Spin dipoles con now be projected in the valence and in the conduction band channel, to study independently the spin dynamics of electrons and holes in real-time simulations. This is controlled via a new input variable.
TDDFT (eh space)
- Added possibility to lower cutoff on Fxc. Useful for comparison with G-space simulations
- Defined F_xc_mat for magnons
- Added support to some hybrid functionals
IO
- Further modularization of the I/O subroutines. io_control and io_connect subroutines have been extracted from mod_IO.F
- Splitting of mod_IO and mod_IO_interfaces
- qindx_B table can become very large for systems with many k-points. The variable can now be distributed in memory taking advantage of HDF5 parallel I/O while computing the table and writing, and using the HDF5written file as a buffer, where the code checks value not directly stored in memory when the table is later used
- Implementation of IO compression for the excitonic matrix
- Parallel I/O adopted for HXC collisions in real-time simulations and added option to limit collisions to “cv only” channel
- Adopting new upgraded interfaces for io (def_variable and io_variable) in some subroutines of the code
BSE
- Added support for double grid with Haydock solver
- Reorganization of the BSE subroutines:
K_Transitions_setup split into two subroutines;
created K_dipoles and K_IP_sort;
created K_restart file to handle restart, K_multiply_by_V split into to subroutines;
K_driver split into two subroutines ;
Inversion of qindx_B indexes, distribution in memory and parallel I/O (see also section on IO)
Self-consistent module
- Two independent chemical potentials can be added to model the non-equilibrium excitonic insulator
Real time
- Added possibility to perform simulations with two external fields in yambo_nl to model transient absorption experiments
- Coded calculation of ARPES spectral function starting from GKBA reconstruction of G<(t,t’) from density matrix via ypp_rt
- Added calculation of field envelop and extraction of Rabi coupling
- Transient Absorption via ypp restored and greatly improved
- Improved handling of phenomenological dephasing in degenerate subspaces
- Improve openMP parallelization for the yambo_nl
Electron-phonon
- Electron-phonon self-energy now works with irreducible and expanded gkkp
- Possibility to plot diagonal elements of the gkkp
Performance
- The computation of the dipoles in reciprocal space up to yambo 5.0 was very demanding for pseudo-potentials with a large number of projectors. Since version 5.1 the algorithm evaluating <nk|[x,Vnl]|mk> has been completly rewritten, drastically reducing both the CPU time and the memory requirements. The new implementation avoid memory transfer between host and device inside the loop over conduction and valence bands when running on GPUs, gaining a factor 6.5 in benchmark tests on rutile
Configure
- Automatic configuration of MKL blas, lapack and scalapack,FFT
- Pnetcdf: interface with parallel-netcdf library coded in the configure. It could replace HDF5. Not yet possible to use them directly. (experimental)
- Improved configuration for NetCDF and HDF5
- Automatic configuration of HDF5 if the HDF5 compilers are in the path
Fixes
- Fixed problems with parallel I/O of dipoles
- Handling of magnetic semiconductors improved by defining different size for number of full and metallic bands in the two spin channel independently
- Fixed compilation X_irreduc.F subroutine with some compilers
- Fixed fragmentation indices qindx_B_load
Yambo 5.0
Yambo goes fully GPL
Yambo 5.0 is a major release that comes with many new features and changes in the code. This becomes immediately evident browsing the yambo github repository. So far the GPL repository was a container of the released code extracted from the development repository. Since 2021 and yambo 5.0 the GPL and devel repository are fully in sync (with the minor exception of the dissipation mechanisms in real-time dynamics simulations). Browsing the contributors’ section (https://github.com/yambo-code/yambo/graphs/contributors) the very large activity in the repository is now explicitly shown, with the whole history of commits in the Yambo code.
The new release includes:
- a reorganization of the executables. For example yambo_kerr is now part of the yambo executable and there is no need anymore, for the users, to distinguish the two;
- an improved command line for the generation of the input files;
- the extended release of projects (such as yambo_rt and yambo_ph) for which the gpl version was, so far, limited;
- the release of new projects contained in the self-consistent module of the Yambo code, the yambo_sc executable. The new yambo_sc also includes the “magnetic” and “electric” projects previously known to developers as part of yambo_magnetic and yambo_electric;
several improvements to the core part of the Yambo code (GW and BSE) - extended CUDA support.
This is a drastic change in the code release philosophy. Until now, Yambo releases just included features which had been used to produce published results. From now on it will also contain experimental features. A small price to pay is that not all the new features are fully tested or supported. However, we believe that their release can be useful for the progress of the Yambo code, for the users’ community, and for new developments which can, in turn, boost the research in condensed matter physics of the whole scientific community. We are also confident that the Yambo users’ community will provide valuable help in testing and improving such experimental features.
The Yambo Team acknowledges financial support from the MaX Centre of Excellence which made this release possible.
A detailed list of the changes follows, with the experimental features explicitly marked.
Other info can be found in the new Yambo 5.0 cheat-sheet:
https://docs.google.com/presentation/d/1A4uRrCh4qgJhOQimrr_yI9GFFz0xJopWd3dOtiwusts
Main reference: https://iopscience.iop.org/article/10.1088/1361-648X/ab15d0
List of changes
General
New driver
The input file generation keywords have been changed, in place of a single letter now extended word can be used. For example the input file generation for a BSE calculation, which previously was generated with
yambo -b -o b -k sex -y d
can be now obtained with the new command
yambo -X s -optics b -kernel sex -Ksolver d
This was made for different reasons. (i) The number of run-levels in the code is growing and the number of letters available to select them was becoming a limit. Now the driver can handle full words, i.e. -optics in place of -o. (ii) A more logical reorganization of the runlevels. For example, the different kinds of dielectric functions are now grouped under -X (which can be static, dynamic, or plasmon pole approximation), while before they were selected via independent options. (iii) The new options are hopefully more clear for a user.
As part of the modularization of the Yambo code, one of the goals is to split the source into different blocks (or modules), and eventually promote them into libraries. The new driver has been used to create a first experimental library. The source code is not contained anymore in the main repository of the yambo code, but it has been moved in a submodule:
https://github.com/yambo-code/yambo-libraries
This is the latest release of the driver: driver-0.0.2.tar.gz
Configure & Makefile
The Yambo configure has been reorganized to simplify the logic. The source has been strongly modularized. The config folder is now divided into two main subfolders which include many .mk and .m4 files. This change is mostly of interest to developers.
External libraries (experimental feature)
Yambo is now interfaced with libraries “FUTILE” and “YAML” provided by the BigDFT community within the MaX CoE. Moreover we have upgraded the version of the external libraries which are automatically downloaded by the yambo configure. This change was particularly important for the Slepc and Petsc libraries since it implied changes in the calls to the functions provided by such libraries in the yambo source code.
Cuda support
Handling of GPU support in the sources has been further streamlined and simplified; calculation of DIPOLES has been extensively revised resulting in a significant performance gain; solution of the Dyson equation for W in X_redux can now be selectively performed on CPU or GPU (see var ChiLinAlgMod).
Performances
The performance for the calculation of DIPOLES on GPU has been significantly improved (also leading to improvements on the CPU-only time-to-solution); a few other bottlenecks have been identified and solved. Overall, the improvements of the code since Yambo 4.4 can be exemplified by the case of H-rutile (a supercell of 72 atoms of rutile TiO2 plus one H-interstitial defect, studied at the GW level).
Figure 1. Left panel: baseline data referring to runs performed on Marconi-A2 KNL (Intel Xeon Phi Knights Landing, 68 cores/node) in early 2019 for rutile-H. Right panel data for the same system (October 2019).
Figure 2. Yambo performance (time-to-solution and speedup) as a function of the number of nodes for rutile-H. Left panel: yambo 5.0 runs on Marconi-A3 (Skylake partition). Right panel: yambo 5.0 run on Marconi-100 (IBM-P9 + 4 NVIDIA V100 GPU cards).
In Figure 2 we are plotting some recent datasets obtained for the same use case (rutile-H) on Marconi-A3 (Skylake partition, M-SKL, 48 cores/node) as well as on Marconi-100 (M-100, equipped with 4 NVIDIA V100 per node). In both cases the scalability is quite good. Interestingly, on Skylake a time-to-solution of about 2000 seconds is obtained on 40 nodes, at variance with KNL where the same timing required at least 128 nodes. Even more so when considering the GPU-accelerated case: here the smallest partition considered (5 nodes) already gives the shortest time-to-solution compared to previous machines (~1000 sec), further reducing it with good scalability down to about 200 seconds. Note that 40 nodes of M-100 correspond already to a quite large computational partition of about 1.2 PFlops (which is exploited with performance by a GW calculation on a rather small system).
I/O
By default the I/O is now done in NETCDF-4 format, based on HDF5. It is still possible to switch back to NETCDF-3 in most cases. Only real time simulations, so far, explicitly require NETCDF-4 file format. It can be also requested to print output files in netcdf-format (--enable-netcdf-output).
The core io_subroutines of yambo have been moved to a separate folder. This is a first step towards the realization of an IO library for yambo.
Descriptors have been added to handle the metadata associated with the yambo databases. The info contained in the descriptors are also reported in the report of simulations performed with the yambo code.
Doxygen documentation (work in progress)
We started setting up files for doxygen documentation. Hopefully documentation will be soon available.
Memory handling
The memory driver has been extended with an improved report on the memory usage for the users. An experimental check on memory allocation on the node is now available for parallel runs.
yambo
Dipoles (partly experimental features)
The dipoles runlevel has been extended. It now includes the capability to compute, beyond the optical dipoles, also the "spin dipoles" needed to compute magnons. Also, for isolated systems a preliminary implementation of the ``orbital dipoles’’ is available
Double grids support
The support to the double k-points grid has been revised and improved. The user can now use the double grid for different runlevels: dynamical response function and screening in G-space, BSE with inversion solver, real time dynamics (see yambo_rt). The mapping can be done in different flavors.
Coulomb cutoff
The Coulomb cutoff technique for systems with reduced dimensionality now supportst a Wigner-Seitz shape in systems of 2D,1D and finite systems in orthorhombic cells.
Self-consistent GW on eigenvalues (evGW)
In Yambo 5.0 self-consistent GW on G or G and W is implemented, for more information see the dedicated tutorial:
http://www.yambo-code.org/wiki/index.php?title=Self-consistent_GW_on_eigenvalues_only
ACFDT total energies (highly experimental feature)
The Adiabatic Connection Fluctuation and Dissipation Theorem (ACFDT) total energy driver and subroutines are now part of the source code. It is an old feature that was never released, nor maintained.
Kerr effect and magneto-optical properties
The magneto-optical properties, which were previously obtained via the ya...
Yambo 4.5
With Yambo 4.5 support to CUDA Fortran has been implemented
- Yambo structure modified to deal with GPU accelerator devices porting done using CUDA Fortran (available whit the PGI compiler)
- DIPOLES, RESPONSE FUNCTION, HF, GW, BSE have been ported;
- fully compatible with MPI and OpenMP; typically, 1 MPI/card, OpenMP threads used to exploit the remaining computational capabilities of the host.
- inclusion of dedicated headers (dev_defs.h) to handle simultaneously the CPU and GPU compilation.
- GPU allocations integrated in YAMBO_ALLOC/YAMBO_FREE and memory module.
- DevXlib (developed jointly with the QE team and hosted as a separate repo on GitLab) imported and extensively used to provide wrappers for memcpy, sync, init, and simple data operations.
New DIPOLES_driver
As part of the modularization process of the code, within the MAX project, all the subroutines dealing with the calculation and the I/O of the dipoles have been moved under the folder src/dipoles and the "dipoles" runlevel has been created. The DIPOLES_driver is not called directly by the yambo_driver. This made possible the creation of a dedicated parallel scheme for the dipoles and thus a more efficient distribution of the calculation (both time-to-solution and memory footprint). Later, other runlevels just need to load the pre-computed DIPOLES from disk. This also avoids strong load umbalance for example in the calculation of the response function, where the dipoles are needed only at q=0 .
Modularization of the BSE subroutines.
The files
K.o K_correlation_collisions.o K_exchange_collisions.o
have been split into
K.o K_correlation_collisions.o K_exchange_collisions.o K_correlation_kernel.o K_exchange_kernel.o K_screened_interaction.o
This reduces code replication and to make possible an easier handling of CUDA and OPENMP directives
Moreover the code is ready for finite-q BSE implementation which will be likely made available with the next release
More:
- Reorganization of the main yambo_driver. The main subroutine of the code has been cleaned and reorganized to allow a more easy implementation of new features;
- p2y can now also read the output of the projwfc.x post processing (QE suite);
- improved configuration of external libraries;
- new mapping of the k-points introduced. It can be useful for gamma centered grids in hexagoanl cells, when the standard mapping may fail;
- subroutine G_index_energy_factor introduced;
- IO of some tables moved from integer/real to character to reduce disk use in real-time calculations;
- general improvements in coulomb_cutoff for reduced dimensionality systems;
- Modularization of the subroutines dealing with the input file. Subroutine
src/interface/INT.F
split into
INIT.o INIT_read_command_line.o INIT_check_databases.o INIT_activate.o ; - Several bug-fixes.
Yambo 4.4
See Journal of Physics: Condensed Matter 31, 325902 (2019) for details
https://iopscience.iop.org/article/10.1088/1361-648X/ab15d0
Parallel IO and BSE restart
The BSE matrix can now be written in a single file if parallel I/O is activated at configure time (--enable-hdf5-par-io). Doing so the calculation of the BSE matrix can be restarted also after a crash.
New interface with the abinit code based on NETCDF
The new interface directly reads the NETCDF WFK files. Abinit needs to be compiled with NETCDF I/O flavour and the variable prtkbff 1 must be set in the abinit input file. Thus the new interface does not need the KSS file anymore nor the ETSF/IO library. Moreover the sphere of G-vectors is automatically expanded to the higher cutoff needed for the density (and for other operations in the yambo code
Improved interface with Quantum Espresso code
The p2y interface has been strongly developed and remodularized. Now p2y correctly works also with a set of hybrid functionals and with QE up to version 6.4.1
Yambo real time module
The module for running real time simulation has been significantly extended. It is now possible to select the desired level of approximation in the input file from TD-IP to TD-SEX. Moreover the parallelization over k-points has been significantly improved.
New Interpolation driver
The old interpolation routines in ypp have been re-organized and modularized (now there is an INTERPOLATION_driver). The two methods (NearestNeighbout and Boltzmann) are coded.
More: general
- Fixed definition of dielectric function in presence of Coulomb cutoff
- folder bin-libs renamed lib/bin
- External libs clean-up improved
- debugging flags coded in acx_fortran_flags
- SET_job_strings_and_dirs.F routine coded to handle new job string format. Example -J path1/path2/job,path3/jobp
- "js" removed from SET_defaults
- Closing operations moved into CLOSE_the_run routine
- GLOBAL renaming
-- OP_APP_CL -> OP_APP_WR_CL [DS: Is it the other way around?]
-- stop_now -> STOP_now
-- string_remove -> STRING_remove
-- string_pack -> STRING_pack
-- SC_R -> H_rotation - STRING_match routine added to compare avoiding the check on upper/lowercase.
- PARSER now re-organized with specific file names (PARSER_*.F)
-- Parsing of arrays better coded. Now they support memory.
More: Yambo
- XC_potentials added to handle the definition of kernel potentials (collisions)
- QP_state_extract_print added to modularize the QP printing operation.
- OBS_rotate added and ready to be used.
-- H_rotate moved to separate routine
-- created a real_time_hamiltonian folder (includes RT-Hamiltonian related operations) - Added FC-ORTHOROMBIC to crystal lattice.
- Added more PARALLEL logicals
-- logical :: HEAD_q_cpu =.TRUE.
-- logical :: HEAD_b_cpu =.TRUE. - Now the input file is copied (adding the JOB identifier) when a com directory is different from the PWD.
- INIT_report_and_log_files now defines log/report filenames
- RT components now stored in an OBSERVABLE database.
-- JPC_RT_IO_t -> OBS_RT_IO_t
-- JPSM -> OBSERV
-- Complex io_bulk used in io_RT_components - RT_ouput now made modular and splitted in several parts:
-- mod_RT_outuput
-- RT_FILE_add - YPP interpolate routines made available to the main code moving interpolate in src
More: YPP
- ypp/init -> ypp/interface
- YPP "RTDBs" manual removed
- DEEP revision of all RT-related operations
-- Now INTERPOLATION using BoltzmanFourier functions works in all sections (RT-occs, RT-dos) - Also ypp now prints runlevel specific log and report files
Yambo 4.3
List of new features:
- implemented USPPs in YAMBO and created a new prototype library, qe_pseudo;
- improved the PWscf to YAMBO interface, p2y;
- fully supported parallel linear algebra and implemented a new level of parallelism
for the irreducible response functions; - implemented in YAMBO the support for SLEPc and PETSc libraries;
- calculation of dipole matrix elements with alternative approaches;
- non-linear optics from time propagation;
- Quasiparticle corrections can now be calculated starting from hybrid functionals.
Extension of the p2y interface with Q UANTUM ESPRESSO.
Within YAMBO , p2y is the main interface with Q UANTUM ESPRESSO, reading, checking, and importing all data (such as crystal, electronic structure info, and wavefunctions) produced by the preliminary DFT steps. We have extended p2y capabilities by implementing:
(i) auto-recognition of the Q UANTUM ESPRESSO data formats;
(ii) paralll execution of p2y;
(iii) handling of non-linear core corrections (NLCC) and USPPs data.
In the past the user had to provide at configuration level the format of DFT data to interface with, internally handled by precompiler macros. If different data formats were used, YAMBO had to be recompiled. While supporting multiple QE formats (qexml, qexsd, qexsd+hdf5),
(i) we have added to p2y the capability of recognising the format, thereby superseding the need of user intervention. Moreover,
(ii) p2y can now be run in parallel (parallelism is expressed at the MPI level over conversion of wavefunction data blocks). This is particularly relevant when large scale systems are addressed, resulting in wavefunction volume easily larger than 100 GB. Together with the implementation of USPPs, we have also supported the use of NLCC
(iii), a pseudopotential option that broadens the class of potentials YAMBO can deal with.
Support to USPPs:
Norm-conserving pseudopotentials (NCPPs) are used in YAMBO to describe valence electrons. NCPPs can show high transferibility but, due to the norm conservation constraint, may also require the use of a large plane-wave basis set to obtain suitable pseudo-wavefunctions. In these conditions, GW calculations can be computationally demanding. In order to reduce the computational cost induced by the use of a high kinetic energy cutoff, we have implemented support for USPPs in YAMBO . USPPs lead to a reduction of the plane wave basis set used to describe the pseudo-wavefunctions. However, the implementation requires the evaluation of augmentation terms (often non-trivial to compute) that have to be included to account for wavefunction normalisation and, in turn, to properly evaluate matrix elements and observables.
The YAMBO implementation of USPPs have been achieved in three steps. First, (i) we have generated the external library (qe_pseudo). Then (ii) the library has been imported in YAMBO and the main interface layer built (mostly the initialisation of the USPP data). Finally, (iii) new routines specific to YAMBO (such as the augmentation of the matrix elements entering the response function) have been implemented.
For coding the interface with USPPs, we have taken advantage of the implementation already existing in Q UANTUM ESPRESSO. Some routines (mainly concerning the reading and handling of USPPs data and the calculation of augmentation contributions) have been extracted from Q UANTUM ESPRESSO and cast into the form of an external library (qe_pseudo), currently in beta form. The library is currently shared with YAMBO. As a follow up step, this module will be further developed as a self-standing library and made usabe via well defined and public APIs. This is part of the long standing modularization work carried on within MAX.
Parallel linear algebra distribution of response functions.
One of the most serious bottlenecks that is common to many of the calculations performed by YAMBO is the accumulation and storing in memory of the frequency dependent dielectric matrix X. This is commonly obtained as solution of the Dyson equation for the response function. In YAMBO this is written in the reciprocal space, therefore its dimension is directly connected to the number of plane waves used to represent the system under study. In the latest release we implemented an advanced slicing of the response function that is efficiently distributed in memory and workload. This slicing is peculiar to YAMBO and we coded ad hoc interfaces between the slicing and the blacs/scalapack structure. In this way all steps of the
Dyson equation are distributed in memory and, together with the use of the scalapack library, the dielectric matrix is never fully allocated in memory. This new feature allows us to overcome one of the most serious drawbacks of Many-Body ab-initio calculations.
SLEPC & PETSC support.
Solving the BSE implies the solution of eigenvalue problems for the two-particle Hamiltonian that in the e–h basis can be a matrix as large as 10^6 ×10^6 . The standard matrix full diagonalisation algorithm is available in YAMBO through the interface with the LAPACK and the ScaLAPACK libraries. Alternatively, when only the spectrum is required, YAMBO provides the iterative Haydock–Lanczos solver. The iterative approach is much faster, however it does not provide information on the excitonic wave-functions. Recently, YAMBO has been interfaced
with the SLEPc library which use objects and methods of the PETSc library to implement Krylov subspace algorithms to iteratively solve eigenvalue problems. These are used in YAMBO to obtain selected eigenpairs of the excitonic Hamiltonian. This allows the user to select a fixed number of excitonic states to be explicitly calculated, avoiding the full diagonalisation and thus saving a great amount of computational time and memory. The SLEPc solver makes it possible to obtain and plot exciton wave functions in large systems where the full diagonalisation might be computationally too demanding.
Non-linear optics.
With the last YAMBO release we made available a real–time approach to compute linear response absorption spectra via the solution of the Bethe–Salpeter Equation in real time, i.e. propagating of the one–body density matrix. In the present release we also make available a real–time approach to compute non linear response function, via the time propagation of single particle wave–functions. The approach is part of the official YAMBO release thanks to the merge into the YAMBO project of the developments which had been previously released in the Lumen code.
Alternative formulation and computation of dipole matrix elements.
The dipole matrix elements r nmk = hnk|r̂|mki are needed in YAMBO to compute the absorption spectrum and can be evaluated using the relation [r, H] = p + [r, V nl ]. Their calculation is quite demanding, due to the [r, V nl ] term, and its implementation has been strongly optimised and extended to account for projectors with angular momentum l > 2. We have also made available alternative strategies. The shifted grids approach uses wave-functions computed on four different grids in k-space, i.e. a starting k-grid plus three grids with k + q e i slightly shifted along the three Cartesian directions e x , e y , e z . Such approach is computationally more efficient, although it requires to generate more wave–functions. The Covariant approach exploits the definition of the position operator in k space: r = i∂ k . The dipole matrix elements are then evaluated as finite differences in between the k-point of a single regular grid. A five-point midpoint formula is used, with a truncation error O(∆k 4 ). For finite systems, finally, the dipole matrix elements can be directly evaluated in real space (R-space x approach).
GW calculations starting from hybrid functionals.
In the present release we have added in YAMBO the possibility to compute GW corrections starting from ground state calculated using hybrid functionals. This feature permits to achieve a better precision in GW calculations for materials where portion of exchange are needed for a good description of the ground state. The implementation of this feature consists in: (i) read the parameter of the xc functional from the Q UANTUM ESPRESSO output by using the new p2y interface (see sec.6.3), (ii) calculate the expected value of the exchange-correlation potential
using the libXC library and finally (iii) remove the divergency in the Fock term. Point (iii) has been realising by adding in YAMBO the possibility of truncating the interaction beyond the Wigner-Seitz supercell. The same scheme is shared in the Q UANTUM ESPRESSO code.
Yambo 4.2
Memory Support
A dedicated and centralised procedure for memory allocation/deallocation and tracking have been decided to improve control over memory usage.
All allocation and deallocation statements in the code have been replaced by avoiding any performance issue, by exploiting Fortran/C preprocessor directives, keeping the inlining of allocations/deallocations. Fortran pointers have been replaced by allocatable arrays when possible, with a small though sensible performance improvement.
SLK new interface
Dense parallel linear algebra have been implemented by exploiting Scalapack (SLK) library within the MPI parallel structure of YAMBO. Concerning the RPA response, for instance, this means that on the top of the MPI parallelism over q vectors, multiple SLK parallel linear algebra instances are run at the same time (one per q vector).
In the specific case of the response function, we have also reformulated the initial linear algebra problem as the solution of a linear system, with the aim of reducing communication.
QP-DBS operations:
This new tool allow to perform all kind of simple arithmetic operations on QP databases. It is now possible to merge databases coming from different calculations and perform operation as addition, subtraction and multiplication. This tool turns out to be very useful in calculations of QP properties of large systems, where it allows one to split a large run (many k-points and many bands) requiring high memory and cpu resources in several independent and ess demanding runs.
Real-Time BSE
A real-time approach for the solution of the BSE have been implemented to overcome problems connected with the storing of the Bethe-Salpeter Kernel (BSK). The new approach is based on the re-formulation of the problem by using non-equilibrium MBPT which reduce the solution of the BSE to a real time problem where the kernel is never actually calculated. Yambo can now calculate directly the polarization function, real-time.
P2Y
QUANTUM ESPRESSO has implemented a new data layout, featuring a schema-compliant XML format combined with either Fortran or HDF5 binaries (if available) to store massive data (like wavefunctions). Though the previous data layout is still available and operational, in view of the tight data exchange between QUANTUM ESPRESSO and YAMBO, we have implemented the support for the new format. This comes as a native and independent software library, which has the advantage of being simple to install and flexible to use. This library can in principle be used by other third party codes interested in interfacing with QUANTUM ESPRESSO.
WF-IO
We have changed the structure of the ns.wf fragments generated by the interfaces (a2y, p2y, etc..). The new implementation allows to reduce time and memory consumption of the procedure that read KS wave-functions (ns.wf databases) imported by other DFT codes. With the new version each core needs to read only the fraction of the fragment it needs. With this upgrade both the memory and the time needed now scales with the number of MPI cores used. This new features becomes crucial when running on big databases and thousands of cores. Note that old wavefunction databases are not compatible anymore with the current release. Anyway a tool to convert database among releases is provided.
New interface for IO of complex variables improved and widely use in the code
The interface to print complex variable to file have been further improved, thanks to the use of pointers. The new interface has been adopted for the I/O of all complex variables with a substantial simplification of many I/O subroutines. Note that many databases generated with previous yambo releases are not compatible anymore with the current release.
Modularization of configure
The main source for the autoconf tools "config/configure.ac" was reduced from 464 lines
down to 270 lines moving part of the source in files specific for libraries and configuration
flags. The newer simplified structure makes now easier to introduce dependencies on new
libraries including the automatic management of either the linking of external libraries
or the download of internal ones (see for example internally available HDF5 library in
following section “Interfaces with NetCDF4/HDF5 for advanced IO”)
Extension of the number of libraries internally supported
With the new structure of the configure the number of internal libraries which are sup-
ported grew from 4 (iotk, etsf-io, netcdf, and libxc) to 12 with 8 new libraries now na-
tively supported (blas, lapack, scalapack, blacs, petsc, slepsc, fftw, and hdf5) and the up-
grade of the netcdf to the most recent version (4.4) which is made of 2 separate libraries
for C and fortran (netcdf and netcdff). All these libraries can now be automatically down-
loaded and compiled when running "make yambo". The same libraries can then be linked
in future compilations of the YAMBO or even by other codes. We tested the system with
different architectures and compilers.
Interface with FFTXLib domain libraries
In the 4.2 release we have modularised the calls to the FFT kernel done in YAMBO
and implemented an interface with FFTXLib, developed within MAX Center of Excellence as a materials science domain specific library. This library is now distributed with YAMBO (under the name of FFTQE) and can be called by the main code (a subset of the features implemented in FFTXLib is currently supported, though a more extended interfacing is planned).
Response function and Green’s function terminators
In order to reduce the number of empty states necessary to converge both polarizability and GW self-energy, we have implemented in YAMBO both response function (X) and self-energy (G) terminator techniques, in the scheme proposed by F. Bruneval and X. Gonze [1]. X and G terminators can be exploited to accelerate GW convergence leading to a reduction of both memory usage and time-to solution. Terminator algorithms do not affect scalability and parallel performance of YAMBO.
Interface with the WANT package for the interpolation of GW results
In the current release Yambo has been interfaced to the WanT package (http://www.wannier-transport.org) to exploit the real-space tools based on pseudo-atomic orbital projections [2] and Wannier functions [3,4] to Fourier interpolate (real and complex) band structure and density of states in the framework of GW. This feature is an alternative to the ypp tool for band interpolations already present in the previous release based in polynomial interpolation and it is very useful when the ypp interpolation presents difficulties or shows noisy bands.
[1] F. Bruneval and X. Gonze, Phys. Rev. B 78, 085125 (2008)
[2] Agapito, L. A., Ferretti, A., Calzolari, A., Curtarolo, S. & Nardelli, M. B. Effective and accurate representation of extended bloch states on finite hilbert spaces.
Phys. Rev. B 88, 165127 (2013).
[3] Marzari, N. & Vanderbilt, D. Maximally localized generalized wannier functions for composite energy bands.Phys. Rev. B 56, 12847–12865 (1997).
[4] Marzari, N., Mostofi, A. A., Yates, J. R., Souza, I. & Vanderbilt, D. Maximally localized wannier functions: Theory and applications. Rev. Mod. Phys.84, 1419–1475 (2012)
Yambo 4.0
Starting from version 4.0 Yambo implements a massive parallelization scheme based on
a pyramidal organization of CPU's among many different levels.
See http://www.yambo-code.org/doc/parallel for more informations and check the
new http://www.yambo-code.org/tutorials/Parallel/ tutorial to have a first practical introduction to the new code functionalities.
This version is in devel status. This means that, in the next months, we will fix all possible bugs and mis-functionalities reported by the users. When the new source will be stable enough it will made the new stable version.
Compared to the stable version the new yambo devel release includes many changes, bug fixes and new structures. From a deep point of view the old source has been revolutionized.
Additions:
Yambo 4.0.0 now can be linked to a wealth of external libraries. Several common and/or vendor specific FFT libraries are now supported. In addition, the standard NETCDF, libxc, ETSF-IO, and IOTK libraries can now be automatically downloaded, configured, and installed during the compilation process of Yambo without any extra effort of the user.
Changes:
The new parallel structure is based on a brand new MPI+OPENMP environment. This has imposed several deep changes of the code structure. These changes now allow Yambo to implement:
an efficient workload distribution, up to (ten) thousands of cores.
an efficient memory distribution
a controllable I/O that can be even reduced to no databases written (except the output files) and only a minimal reading of the wave-functions (via a buffering system)
The parallelism works over several different levels (4 MPI levels over physical indices such as k and q points, conductions and valence bands, etc, and one OPENMP level distributing the loops over G vectors and/or real space grid points).
Yambo 3.2.5
Bugs:
- In k_lattice. This is an old issue related to the search of the unit vectors of the BZ grid. I already had changed this part in 2010 but,
apparently, it does not work always, like in the surface case of the Fantastic Dimensions tutorial. Therefore I decided to merge the old
and new method in a new procedure that, hopefully, will work always. - In G_shells_finder.F
- In QP_ctl_load.F
- Small bug fix in O_driver for out-of-the BZ calculations
- In QP_ppa_cohsex wrong number of elements in the live_timing call. This bug was already spotted in previous commits. Therefore
this new bug-fix must be checked carefully. - In the msg routine large real numbers with exponents larger then 10 were uncorrectly printed
- Bug in k_grid generator when the number of k-point > 1
- In GW the expected time in the log was not computed correctly so the live timing was not working for the G0W0 PPA part
- Fixed issues with gfortran compiler
- Bug on the routine that controls the loading of user defined QP databases
Additions:
- Possibility to link FFT different from FFTW for example MKL one
see http://www.yambo-code.org/doc/compiling.php - Added a routine (variables_X) to handle the description strings of the response function
- ERIM for the BSE moved in GPL
- eval_minus_G.F. In order to do BSE calculations with fractional occupations the coupling is necessary. Before this
commit, however, non-resonant calculations were possibile only by using the Time-Reversal. This condition has been
removed by introducing a table to find the indexes of the -G vectors. - Introduced computer space unts: Kilobyte, Megabyte, Gigabyte
- [a2Y] Support to read infos for HGH psps from a patched version of abinit.
- [A2Y] Added support to pseudopotential projectors.
- Added PARALLEL support to p2y. It can now run in parallel.
- [P2Y] Added several PW-based routines to import the commutator of the non-local pseudo.
- A new option for p2y (-c)
- Fragmentation of db.dipoles and db.kindx.
- ioKB_PP.F: the db kp_bb is fragmented when -S is given in the command line. Needed to overcome netcdf
limitation in max # of vars (8129). Also found that for large grid calculations
(though below the netcdf restrictions) are speed up by this option. - BSEEhEny moved in GPL
- Wrong allocation of BSS_rhoq0 when present BS_eh_W
- The logical which commands the parallel I/O was wornlgy defined. Errors in
parallel I/O after call to the IO_and_Messaging_switch fixed. - Wrong number of energy steps COHSEX
-Frequency dependent correction to the LRC TDDT kernel. By S. Botti (Phys. Rev. B 72, 125203 (2005))
Changes:
-
GLOBAL COPYRIGHT YEAR RANGE change
-
Various typos fixed
-
Partial Fix of optical calculations using shifted grids. Eps0 calculated with the shifted grids agree with the calculation with
the non-local part of the pseudo. There is still a large prefactor (~70) I could not find the reason of. Also the
generation of the grids by ypp has been changed.
*** this piece of code is still to test and improve *** -
modules/mod_matrix_operate.F: new function m3inv_c which inverts a complex matrix
-
The inversion map of the RL is not save in the gops databases. This is to avoid
the calculation of the map that can be cumbersome in large systems. -
Added the call to the Blas wrapper to avoid problems with stupid compilers in qp/XCo_local.F
-
GLOBALLY renamed the unit convertion factors following a pre-estabilished
format: U = U2P * P
So, for example,1 Hartree = 27.21 eV => AU2EV = 27.21
1 Bohr = 0.53 A => AU2ARM= 0.53and so on.
-
Numerical input variable of QP_apply changed in character
-
More digits in memory report
-
Removed logical l_a2y_KBPP from a2y interface for reading KSS without psps now that a new version able to
read also pseudo with many projectors exist. -
The old flag of p2y (-g, Force use of all available RL vectors) now changed so
to load a certain number of G-vectors. -
Interfaces do not call section at the beginning. Instead a simple msg is used.
-
Structure of header files and drivers to be C compliant
Additions:
-Use nc-config for library flags in order to fixed incompatibility with NetCDF 4.12
-Parallelization of QP_ppa_cohsex.F changed in order to work also for cohsex done with only full bands
-New tool to create and manage ndb.QP, ypp -q
-HDF5 support
-C routine to scan the subfolders of a given folder
-Frequency dependent correction to the LRC TDDT kernel. By S. Botti (Phys. Rev. B 72, 125203 (2005))
-*** On-the-fly database to RESTART interupted GW calculations ***
-E-RIM moved in GPL