Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting "arbitrary" properties #43

Open
rkurchin opened this issue Mar 26, 2022 · 32 comments
Open

Supporting "arbitrary" properties #43

rkurchin opened this issue Mar 26, 2022 · 32 comments

Comments

@rkurchin
Copy link
Collaborator

This has come up a couple of times now. Examples could be:

  • a charge associated with every atom (atom-wise property, scalar)
  • a dipole moment associated with a system (system-level property, vector)
  • ...and plenty of others.

@mfherbst, I recall a discussion of a relatively flexible/generic way to do this at some point but now can't find it, or recall the details of the proposal, but since this came up again recently over at FermiQC/Fermi.jl#120 (comment), I figured I'd file an issue here for discussion.

@mfherbst
Copy link
Member

mfherbst commented Mar 26, 2022

Atom-level properties already work (see Atom struct). Basically just overload getproperty and hasproperty. IMHO there is nothing additional needed for that. Same thing could be done with an identical mechanism for the system. Again no need to define an extra interface as Base already provides this.

I think the only thing we have to do in AtomsBase is to agree (and document) the names of the standard keys and their expected size / type of the returned fields.

@mfherbst
Copy link
Member

mfherbst commented Mar 26, 2022

In light of what I just said, let's brainstorm a little on what people would need / are already using (feel free to amend this post):

System

  • ...

Atom

  • magnetic_moment: Number or vector of three floats (used in DFTK)
  • pseudopotential: String, identifier or file name of a pseudopotential file (used in DFTK)

@rkurchin
Copy link
Collaborator Author

🤦‍♀️ that's why I couldn't find the atom-level one in an issue, because we already did it. Haha, my bad!

This approach sounds great. I suppose charge and dipole moment could be system-level for molecules? As far as atom-level properties, isotopic information might be useful. I'm not sure if we want to throw in everything we can imagine now, though, or do things more sparingly on an as-needed basis...

@mfherbst
Copy link
Member

I think we should first collect the things someone has an actual use case for to not avoid clutter

@rashidrafeek
Copy link
Contributor

A property which might be relevant would be the "Topology", which stores the bonding information of the system. In many of the formats used for classical MD calculations this information is also present in the trajectory files.

For example, The Frame type in Chemfiles.jl which is the analogue of an AbstractSystem, stores Topology in addition to positions, velocities and the unitcell.

@jameskermode
Copy link

Here are some examples of system level properties I've used recently:

  • Total energy
  • Kinetic energy
  • Potential energy
  • Free energy (of ion-electron system)
  • Charge
  • Dipole moment
  • Polarisability
  • Stress tensor
  • Strain tensor

I think it will be very hard to anticipate what users would like to store - for example I'd often want to store multiple energies coming from DFT and a variety of more approximate models - so I'd advocate for making it as flexible as possible, but I agree there's nothing wrong with identifying and documenting some standard properties.

@cortner
Copy link
Member

cortner commented Aug 4, 2022

See my implementation of particle States in ACE.jl - that's exactly what it provides. It could potentially move here, or it could inspire a new and better implementation here.

@cortner cortner mentioned this issue Aug 4, 2022
@lmiq
Copy link

lmiq commented Apr 28, 2023

Coming late to the discussion, I have just posted a comment on the Zulip chat about something related.

I think both the for System or for Atom, we could have the additional data represented as a typed generic field:

struct Atom{D, L<:Unitful.Length, V<:Unitful.Velocity, M<:Unitful.Mass, ExtraData}
    position::SVector{D, L}
    velocity::SVector{D, V}
    atomic_symbol::Symbol
    atomic_number::Int
    atomic_mass::M
    data::ExtraData
end

We can let the current interface for when ExtraData <: Dict, but with that we can also include type-stable data carrieres for anything the user wants. With the important advantage of being able to overload functions for custom Atom types, such as, for example, show. In my case, I want several additional fields (to represent all ATOM data of a PDB file, and want to print the atoms as:

julia> pdb = PDBTools.wget("1LBD")

julia> pdb
   Array{Atoms,1} with 1870 atoms with fields:
   index name resname chain   resnum  residue        x        y        z  beta occup model segname index_pdb
       1    N     SER     A      225        1   45.228   84.358   70.638 67.05  1.00     1       -         1
       2   CA     SER     A      225        1   46.080   83.165   70.327 68.73  1.00     1       -         2
       3    C     SER     A      225        1   45.257   81.872   70.236 67.90  1.00     1       -         3
                                                       
    1868  OG1     THR     A      462      238  -27.462   74.325   48.885 79.98  1.00     1       -      1868
    1869  CG2     THR     A      462      238  -27.063   71.965   49.222 78.62  1.00     1       -      1869
    1870  OXT     THR     A      462      238  -25.379   71.816   51.613 84.35  1.00     1       -      1870

as for now I need to define a custom Atom structure to be able to overload show, but if I had a parametric type I could just extend AtomsBase.Atom.

I think that would not be breaking at all, and if people feel that is a good idea, I can try to implement it.

@tjjarvinen
Copy link
Collaborator

This is in my opinnion a good suggestion and I can see several cases where it would be useful.

In addition to this, it might be a good idea to change the type of atomic_symbol to a parameter. Because that would allow making the Atom type a bitstype, when using e.g. SVector{2,Char}. But this could have some other issues, so I am a bit unsure of it.

@cortner
Copy link
Member

cortner commented Apr 29, 2023

For exactly that reason we used a wrapped integer in JuLIP but allowed easy translation to a Symbol. With a bit of work the display could even show the symbol instead of the atomic number.

@lmiq
Copy link

lmiq commented Apr 30, 2023

the type of atomic_symbol to a parameter.

I would follow what Cortner suggested, just store the atomic number and make the symbol, or string atomic name, something to be get on the flight.

(although this is a separate issue, and would be breaking, better treat it separately)

@jameskermode
Copy link

The only issue with the last suggestion is that it requires a one to one mapping between species and atomic number, which is maybe ok for quantum mechanics but not for classical MD where there is often more than one type of each atom (even in QM there could be different pseudo potentials)

@lmiq
Copy link

lmiq commented Apr 30, 2023

That right, one more reason to allow a typed extra data field. Then the atomic type information can be stored there in any format needed, and there are many formats possible (Ints, or strings, etc).

For instance someone could implement Atom{D,L,M,OPLSAA}, or Atom{D,L,M,CHARMM}, etc.

@cortner
Copy link
Member

cortner commented Apr 30, 2023

Then I'll promote again my implementation in ACE.jl of wrapped NamedTuples.

https://github.com/ACEsuit/ACE.jl/blob/main/src/states.jl

https://github.com/ACEsuit/ACE.jl/blob/main/test/test_states.jl

@jameskermode
Copy link

I did this in ExtXYZ.jl already. Could be moved here.

https://github.com/libAtoms/ExtXYZ.jl/blob/master/src/atoms.jl

@lmiq
Copy link

lmiq commented Apr 30, 2023

The issue I see with that approach is that it does not enforce anything. It is almost as saying that the interface is just a commitment to the implementation of some functions. IMO the interface is useful if at all agree with at least some properties to be always present.

(I would say positions and atomic number - velocities maybe, but mass, symbol, name, are either redundant with the atomic number or have to be extended arbitrarily for new types).

But if all agree that the interface is only related to the implementation of a subtype of AbstractAtom and some getter and setter functions, there isn't need for this discussion, each package could implement the structure at will. In that case, the base interface should not depend on any type parameter or internal field of the Atom struct.

@cortner
Copy link
Member

cortner commented Apr 30, 2023

I thought we are talking about a prototype implementation. Indeed I don't think the interface can depend on any type parameters at all. Just on getter functions.

@cortner
Copy link
Member

cortner commented Apr 30, 2023

I guess if you want an abstract CHARMM then just implement a new function isCHARMM which defaults to false?

@lmiq
Copy link

lmiq commented Apr 30, 2023

But what interoperability could we get from that interface, at all? That's what I miss.

Why then Atom here is exported at all?

@cortner
Copy link
Member

cortner commented Apr 30, 2023

But what interoperability could we get from that interface, at all?

If the getters are defined and documented in a public interface, that's all you need for interoperability I think? Is there something concrete that you can't do?

Why then Atom here is exported at all?

I think it is intended as a minimal prototype.

I wouldn't be against helping implement another publicly share concrete type say FlexibleAtom or something similar that has more functionality. The standard functionality could be the same as for Atom and hence guaranteed, but if could have arbitrary additional typed field. E.g. in a wrapped NamedTuple, which would also give you the type information you want.

But I wouldn't necessarily want to rely on that concrete implementation, and rather have documented getters that extract the information.

@lmiq
Copy link

lmiq commented May 1, 2023

I think it is intended as a minimal prototype.

Ok, so from this point of view the Atom struct is a minimal prototype. Then, a few questions:

Is the Atom constructor part of the mandatory interface? The docs seem to suggest that.

If the Atom struct is not part of the expected interface, then IMO it should not be exported. And should be better provided as an example. In any case, the present discussion is then of course unnecessary, as arbitrary properties are already accepted by any implementation of a struct for which the list of getters is defined. Is that the correct way to interpret this? The use or not of a symbol to represent the atomic name is also irrelevant as is only the choice for the prototype exemplified.

In this case, wouldn't be sensible to have an AbstractAtom type, to at least carry the information that one expects that an atomic type defined in an independent package is a subtype of AtomsBase.AbstractAtom and, as such, is expected to satisfy the interface?

@cortner
Copy link
Member

cortner commented May 1, 2023

What you suggest makes sense in Java but not in Julia. Abstract super types are not needed to define an interface but are only needed to share methods. At the moment it is not clear what would be shared. Maybe in the future this will change.

@lmiq
Copy link

lmiq commented May 1, 2023

What I can imagine is something like:

julia> abstract type AbstractAtom end

julia> charge(::AbstractAtom) = "not implemented"
charge (generic function with 1 method)

julia> atomic_number(::AbstractAtom) = "not implemented"
atomic_number (generic function with 1 method)

julia> struct Atom <: AbstractAtom
           atomic_number::Int
           charge::Float64
       end

julia> a = Atom(1,1.0)
Atom(1, 1.0)

julia> charge(a)
"not implemented"

such that methods not implemented (but possibly expected) can be documented and provide sensible responses.

@cortner
Copy link
Member

cortner commented May 1, 2023

Why do you need this? The compiler will tell you that it isn't implemented.

But even if you insist you can just use ::Any

@lmiq
Copy link

lmiq commented May 1, 2023

Well, it is a way to inform what the interface expects. And methods implemented across packages can be type annotated on the abstract type to inform that the function in case is expected to follow the interface.

I'm fine not using, but then I don't see even why a package would have to depend on AtomsBase at all.

And, back to the topic, I don't see the point of this issue in particular. The fact that this issue is open makes me believe that it is not clear to everyone what to expect from the interface.

@cortner
Copy link
Member

cortner commented May 1, 2023

We need to depend on AtomsBase to share the functions that we use to access information stored in Atoms or Structures ie systems of particles.

The interface is defined via documentation.

@lmiq
Copy link

lmiq commented May 1, 2023

Do people here agree, at least, that this issue of arbitrary properties is not really fundamental, at least? Meaning, that if one implements these 6 getters:

https://juliamolsim.github.io/AtomsBase.jl/stable/apireference/#Species-/-atom-properties

the interface of the atomic properties is followed as expected?

Seems to me that there is a confusion in what is the interface expected, and what is the Atom structure which is defined as an example of an implementation following the interface, which stores arbitrary properties in a dict, but that's totally secondary.

In any case I don´t think the documentation is clear.

edit: no, I'm not sure about that. Those functions are defined as getters from an abstract system, not an atom. That's confusing. We want an interface to individual atoms or not? Is we do, the natural way (IMO) would be to have an AbstractAtom and the expected functions defined for it, as it is done for AbstractSystem. What is implemented as a concrete type of Atom, if is part of the expected interface, is very limiting (current issue among others), if it is not, it should be placed, IMO, separately, in a example package (or very clearly stated as an example).

@cortner
Copy link
Member

cortner commented May 1, 2023

  • Maybe the documentation can be improved.

  • Atoms is not part of the expected interface as far as I understand, but an implementation detail? I could be wrong.

@cortner
Copy link
Member

cortner commented May 1, 2023

I've said this many times in many discussions and I'll say it again : in the absence of multiple inheritence (but to some extent even with) any sort of AbstractSomethingOrOther limits you severely. Abstract supertypes are only needed to share methods, but it is not needed to document an interface.

@rkurchin
Copy link
Collaborator Author

rkurchin commented May 2, 2023

👆 dropped in to say exactly this – we talked about having something like AbstractAtom a lot early on (as well as several other abstract types), but that ends up putting severe constraints on things because you can't inherit from multiple abstract types in Julia (honestly, I've had some second thoughts about even just having AbstractSystem, since it's already created a few challenges in implementation...)

@cortner
Copy link
Member

cortner commented May 2, 2023

I'm all for dropping AbstractSystem :)

I very rarely use abstract types in my codes nowadays. It was liberating when I realised that I don't need them!!

@rkurchin
Copy link
Collaborator Author

rkurchin commented May 2, 2023

Yeah, I've kind of realized that in Julia, it only makes sense in somewhat narrow cases, and usually where the taxonomy is very unambiguous – e.g. types of numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants