Material Core Standard (MatCore) version 0.3.0

Preamble

Objectives

The Material Core Metadata (MatCore) standard for computational materials science (CMS) represents key aspects of a material-focused computation to make it understandable to the broad materials research community. It provides pragmatic and workable guidelines for documentation of materials data.

The primary goals of the MatCore standard are, by outlining expectations for metadata accompanying a computational dataset, to facilitate: (1) dataset transparency, discoverability, sharing and reuse in theoretical and computational modeling, (2) verification and validation of materials modeling and simulation software, and (3) training physics-based and machine learning (ML) models for material behavior. An additional, more challenging, objective is to promote reproducibility in CMS by providing the information necessary for researchers to regenerate a published dataset.

Scope

The MatCore standard is focused on CMS methods aimed at computing material properties from a microscopic perspective, as opposed to computational methods in which the material is a component (such as engineering finite element analysis). The MatCore standard is designed to be flexible so that it can be extended to any area of CMS and can accommodate future changes in the field. An initial focus is placed on major CMS methods most widely used at the time of the creation of the standard with the aim of adding other methods in the future. The initial methods to be supported are:

  • DFT: Density functional theory
  • MD: Classical molecular dynamics
  • MBPT: Many-body perturbation theory (GW/BSE)
  • ML: Machine learning
  • PF: Phase field
  • DER: Derived properties from CMS calculations

All methods share a common minimal set of metadata defined by the MatCore standard. In addition, each method has its own MatCore extension, e.g. MatCore-DFT, MatCore-MD, etc., that inherits from the minimal MatCore metadata and adds additional method-specific metadata. It is allowable to include multiple method-specific metadata if used in concert. For example, an ab initio molecular dynamics (AIMD) dataset could include both MatCore-DFT and MatCore-MD metadata files to document both aspects of the computation.

Format

Each standard consists of a set of Properties with possible SubProperties, SubSubProperties, etc. The term “Property” will be used in the rest of this section to refer collectively to Property, SubProperty, etc. Properties have the following characteristics:

  • Each Property has a name consisting of a series of one or more words all in lower case separated by hyphens.
  • Required Properties are indicated by an asterisk (*) appended to the end of the Property name.
  • Properties have the following features:
    • Definition: A short explanation of the nature of the Property. Best-practice is to not use words from the Property name in the definition to prevent circular definitions, except in cases where confusion is unlikely.
    • Values: A description of the type of variable (string, integer, real), form (scalar, list, array), and possible values that the Property can take on.
      • When appropriate a controlled vocabulary can be defined. In this case sentence case format is adopted (i.e. for a phrase containing several words, the first letter of the first word is capitalized and all others are lower case except for proper names or acronyms). An option should always be added for users to add terms not included in the controlled vocabulary to allow the standard to support future developments. New terms may eventually be standardized and added to the controlled vocabulary.
      • When a controlled vocabulary is used, adoption of existing standards is recommended whenever possible.
      • When a value has physical units, either the required unit must be specified, or a field provided to define the value units.
      • For Properties that contain SubProperties, this is indicated with the phrase “Contains SubProperties as defined in the table below” (where “table below” points to the specification for the SubProperty standard).
    • Repeatable: A boolean variable that is “True” for Properties that may appear multiple times inside the metadata file. For example, for a dataset created by more than one person, the creator Property can be repeated as many times as needed, each with its own SubProperties.
    • Note(s): Additional clarifying information regarding a Property.

In addition to the required and optional Properties, other arbitrary Properties can be added to any MatCore Standard metadata files to support developments in the field. Such Properties may eventually be standardized and added to the MatCore Standard.

Examples are provided after each Standard specification in XML format. However, this is for clarification purposes only. The MatCore Standard serves as a conceptual and logical framework. It defines the schema for Properties required to ensure data consistency and interoperability. Critically, the MatCore Standard is implementation-agnostic. It dictates what information must be captured and how it should be structured logically, but it does not mandate a specific substrate, file format, or programming language.

MatCore adopts Semantic Versioning (SemVer) three-part versioning system (major, minor, patch) to indicate Standard changes.

Schema for the MatCore Standard

Minimal MatCore Metadata

The minimal MatCore metadata includes aspects of a dataset deemed important to report for data findability and reuse and are universal across all computational methods. The list of minimal metadata is tabulated below.

Table Min-1: Minimal MatCore metadata

PropertySubpropertyDescription
creator*

Definition: The author who generated the data and institutional affiliation.

Values: Contains SubProperties as listed below.

Repeatable: True

name*

Definition: The name of the author who generated the data.

Values: A string containing the author name.

Repeatable: False

affiliation*

Definition: The affiliation of the author who generated the data.

Values: A string containing the author affiliation.

Repeatable: True

title*

Definition: A single sentence description of the dataset.

Values: A string containing the dataset title.

Repeatable: False

creation-date*

Definition: The calendar date when the dataset was created.

Values: A string containing the creation date in YYYY-MM-DD format (following the ISO 8601 Date and Time Format Standard)

Repeatable: False

description*

Definition: A synopsis of the dataset, its contents, and purpose.

Values: A string containing a description of the dataset contents. This can optionally include information on the format in which the dataset is stored (e.g. pointer to a relevant schema).

Repeatable: False

disclaimer

Definition: A statement of applicability provided by the creator(s) informing users of the intended use and/or limitations of this dataset.

Values: A string containing the dataset disclaimer.

Repeatable: False

material*

Definition: A description of the chemical composition and structure of a substance in this dataset.

Values: Contains SubProperties as listed below.

Repeatable: True

phase*

Definition: The structure of the material included in the dataset.

Values: A list of strings containing one or more of the following values:

  • “Amorphous” – A solid material without crystalline order
  • “Crystal” – A solid material characterized by a repeating pattern of atoms.
  • “Quasicrystal” – A crystal-like material with short-range order, but without long-range order.
  • “Molecule” – A finite group of atoms bonded together forming an independent unit.
  • “Liquid” – A fluid structure (e.g. the molten state of a material)
  • “Gas” – A structure where atoms are far apart and only interact during collisions.
  • “Plasma” – A gas comprised of ions and electrons.
  • Other user-specified value if none of the above apply.

Repeatable: False

description

Definition: An explanation of the nature of the material, e.g. a crystal structure designation, a chemical formula for a molecule, etc.

Values: A string containing the material description.

Repeatable: False

constituent*

Definition: A chemical element included in the material and its concentration.

Values: Contains SubSubProperties as defined in the table below.

Repeatable: True

microstructure

Definition: The arrangement of phases, grains and defects in a material.

Values: A string containing a description of the microstructure of the material.

Repeatable: False

computation*

Definition: A description of a computer simulation used to generate data in the dataset.

Values: Contains SubProperties as listed below.

Repeatable: True

method-class*

Definition: The general category to which the computational technique belongs.

Values: A string containing one of the following values:

  • “Electronic”
  • “Atomistic”
  • “Mesoscopic”
  • “Continuum”
  • “Data-driven”
  • Other user-specified value if none of the above apply.

Repeatable: False

Note: The categories listed above are based on the MODA standard.

method*

Definition: The computational materials science (CMS) approach used in the computation.

Values: A string containing one of the following values:For method-class “Electronic”:

  • “CC” – Coupled Cluster
  • “QMC” – Quantum Monte Carlo
  • “DFT” – Density Functional Theory
  • “MBPT” – Many-Body Perturbation Theory methods (such as GW and BSE)

For method-class “Atomistic”:

  • “MC” – Monte Carlo
  • “MD” – Molecular Dynamics

For method-class “Mesoscopic”:

  • “DDD” – Discrete Dislocation Dynamics
  • “KMC” – Kinetic Monte Carlo
  • “CGMD” – Coarse-grained MD

For method-class “Continuum”:

  • “PF” – Phase Field

For method-class “Data-driven”:

  • “ML” – Machine Learning
  • Other user-specified value if none of the above apply.

Repeatable: False

simulation-conditions*

Definition: The interactions between the system being modeled and the rest of the world maintained during the computation.

Values: Contains SubSubProperties as defined in the table below.

Repeatable: False

software*

Definition: Identification of a computer package used to perform the calculations.

Values: Contains SubSubProperties as defined in the table below.

Repeatable: True

citation

Definition: Information that uniquely identifies a source being acknowledged.

Values: Contains SubProperties as listed below.

Repeatable: True

reference*

Definition: Text that uniquely identifies the source of information being cited.

Values: A string containing the reference to the source.

Repeatable: False

doi

Definition: The digital object identifier (DOI) for the source.

Values: A string containing the source DOI.

Repeatable: False

link

Definition: A URI pointing to a permanent location of the source.

Values: A string containing the file URI.

Repeatable: False

funding

Definition: Information about received monetary support or other resources to generate the dataset.

Values: Contains SubProperties as listed below.

Repeatable: True

award-title*

Definition: Name of the grant that provided funding to generate the dataset.

Values: A string containing the award title.

Repeatable: False

funder*

Definition: The name of the funding agency that provided money and/or resources to generate the dataset.

Values: A string containing the funding agency using Crossref Open Funder Registry designation.

Repeatable: False

award-number

Definition: A funder identifier for the grant.

Values: A string containing the award number.

Repeatable: False

related-content

Definition: Other datasets that are connected to the current one in some manner.

Values: Contains SubProperties as listed below.

Repeatable: True

links*

Definition: Pointer to one or more related datasets.

Values: A list of permanent pointers to related datasets (such as MatCore IDs, DOIs, URIs, etc.)

Repeatable: False

description

Definition: Explanation of the relationship between the related content and the current dataset.

Values: A string containing a description of the related content.

Repeatable: False

provenance

Definition: History of the dataset, detailing its origins and transformations.

Values: Contains SubProperties as listed below.

Repeatable: True

event-type*

Definition: Description of change made to the dataset.

Values: A string containing one of the following values:

  • “Initial creation” – Initial creation of the dataset
  • “Admin update” – Editorial update to the dataset not requiring a version update (e.g. spelling correction).
  • “Version update” – Change to the dataset or associated files requiring a new version
  • “Metadata update” – Change to the metadata, but not the dataset itself.
  • Other user-specified value if none of the above apply.

Repeatable: False

date*

Definition: The data when the change was made.

Values: A string containing this document creation date in YYYY-MM-DD format (following the ISO 8601 Data and Time Format Standard)

Repeatable: False

agent*

Definition: Identity of the entity responsible for the change.

Values: A string containing the name or other designation identifying who or what is responsible for the change.

Repeatable: False

comments

Definition: Explanation for the change in provenance.

Values: A string containing an explanation for the nature of the changes leading to the revised provenance.

Repeatable: False

checksum

Definition: A digital fingerprint for the dataset of associated files.

Values: A list containing two strings: The name of a file, and the checksum for the file (e.g. the SHA-256 encoding for file).

Repeatable: False

matcore-id*

Definition: An identifier for the dataset.

Values: A string containing the MatCore identifier.

Repeatable: False

matcore-date*

Definition: The calendar date when this MatCore document was created.

Values: A string containing this document creation date in YYYY-MM-DD format (following the ISO 8601 Data and Time Format Standard)

Repeatable: False

license*

Definition: A contract defining the terms and conditions under which the dataset can be used.

Values: A string containing the dataset license in a format conforming to the SPDX Standard.

Repeatable: False

Table Min-2: Constituent SubSubProperties

SubPropertySubSubpropertyDescription
constituent*species*

Definition: A chemical element included in the material.

Values: A string containing the chemical designation of the element (e.g. “C”).

Repeatable: False

concentration*

Definition: The fraction of the material composed of this element.

Values: A string containing the concentration in percent.

Repeatable: False

Table Min-3: Simulation conditions SubSubProperties

SubPropertySubSubpropertyDescription
simulation-conditions*type*

Definition: The nature of the simulation being performed, whether equilibrium, nonequilibrium, or nonstandard.

Values: String containing one of the following values defining the simulation conditions:

  • “Equilibrium” – Thermodynamic equilibrium
  • “Nonequilibrium” – Conforming to some definition of nonequilibrium thermodynamics
  • “Nonstandard” – A simulation not fitting into the equilibrium and nonequilibrium categories.

Note 1: The SubSubProperties listed below (aside from description) are provided as appropriate given the nature of the simulation. Providing a SubSubProperty means that its value is imposed on the simulation as an external constraint.Note 2: Units for dimensioned variables are to be given in the SI unit system.

Repeatable: False

description

Definition: Explanatory text clarifying the simulation type selection.

Values: A string containing the simulation type explanation.

Repeatable: False

number-of-particles

Definition: A count of atoms or other discrete entities comprising the material.

Values: An integer containing the number of particles N. Standard units: dimensionless.

Repeatable: False

volume

Definition: The amount of space occupied by the material.

Values: A real number containing the volume V. Standard units: cubic meters (m3).

Repeatable: False

mass-density

Definition: The total mass of the material divided by the volume it occupies.

Values: A real number containing the mass density 𝜌. Standard units: kilograms per cubic meter (kg/m3).

Repeatable: False

number-density

Definition: The total number of particles comprising the material divided by the volume it occupies.

Values: A real number containing the number density n. Standard units: inverse cubic meter (1/m3).

Repeatable: False

cell

Definition: The domain occupied by the material in the simulation (if fixed).

Values: Ordered list of three vectors (a, b, c), each with three components, that define the simulation cell. The vectors (a, b, c) need not be orthogonal, and the magnitudes (|a|, |b|, |c|) are the supercell vector lengths. Standard units: meter (m).

Repeatable: False

cell-reference

Definition: The domain occupied by the material in the simulation in configuration relative to which strains are defined.

Values: Ordered list of three vectors (a0, b0, c0), each with three components, that define the cell used in the simulation in its reference configuration. The vectors (a0, b0, c0) need not be orthogonal, and the magnitudes (|a0|, |b0|, |c|0) are the reference cell vector lengths. Standard units: meter (m).

Repeatable: False

cell-periodicity

Definition: Specification of whether periodic boundary conditions are applied along the cell vector directions. This applies to both cell and cell-reference.

Values: Ordered list of three booleans (pa,, pb,, pc,), which are set to true if periodic boundary conditions are applied along the corresponding cell vectors (a, b, c). Standard units: dimensionless

Repeatable: False

temperature

Definition: A measure of the hotness or coldness of an external heat bath in thermal contact with the material.

Values: A real number containing the temperature T. Standard units: kelvin (K).

Repeatable: False

stress

Definition: A uniform force per unit current area, including both normal and shear components, applied to the material during the simulation through contact with an external loading device.

Values: An array of six real numbers containing the components of the (symmetric) Cauchy stress tensor σ in the following order: [σ11, σ22, σ33, σ23, σ13, σ12]. The stress component are expressed relative to an orthonormal basis (e1, e2, e3), where e1 is in the direction of a, e2 is in the direction of c╳a, and e3= e1╳e2, where (a, b, c) are the vectors defining the cell, which may not be orthogonal. Standard units: pascal (Pa).

Note 1: Requires the definition of the cell subproperty in order for a basis to be defined relative to which the tensor components are given.

Note 2: The adopted sign convention is for tensile normal stresses to be positive, and compressive normal stresses to be negative. This implies that for a material subjected to a positive pressure p, the stress tensor components are [-p, -p, -p, 0, 0, 0].

Repeatable: False

strain

Definition: A deformation relative to a reference configuration imposed on the simulation cell expressed in the referential frame (Lagrangian strain tensor).

Values: An array of six real numbers containing the components of the (symmetric) Lagrangian strain tensor E in the following order: [E11, E22, E33, E23, E13, E12]. The strain component are expressed relative to an orthonormal basis (e1, e2, e3), where e1 is in the direction of a0, e2 is in the direction of c0╳a0, and e3= e1╳e2, where (a0, b0, c0) are the vectors defining the cell in the reference configuration, which may not be orthogonal. Standard units: dimensionless.

Note: Requires the definition of cell-reference in order for the basis to be defined relative to which the tensor components are given.

Repeatable: False

strain-rate

Definition: The derivative with respect to time of a deformation relative to a reference configuration imposed on the simulation cell expressed in the referential frame (Lagrangian strain rate tensor), which is constant throughout the simulation.

Values: An array of six real numbers containing the components of the (symmetric) Lagrangian strain rate tensor in the following order: [11, 22, 33, 23, 13, 12,]. The strain component are expressed relative to an orthonormal basis (e1, e2, e3), where e1 is in the direction of a0, e2 is in the direction of c0╳a0, and e3= e1╳e2, where (a0, b0, c0) are the vectors defining the cell in the reference configuration, which may not be orthogonal. Standard units: dimensionless.

Requires the definition of cell-reference in order for the basis to be defined relative to which the tensor components are given.

Repeatable: False

heat-flux

Definition: The amount of thermal energy transferred to the material per unit area per unit time along a given direction.

Values: An array of three real numbers containing the components of the heat flux vector q. The heat flux component are expressed relative to an orthonormal basis (e1, e2, e3), where e1 is in the direction of a, e2 is in the direction of c╳a, and e3= e1╳e2, where (a, b, c) are the vectors defining the cell, which may not be orthogonal. Standard units: watts per square meter (W/m2).

Repeatable: False

temperature-gradient

Definition: A physical quantity that describes the rate and direction of maximum temperature change per unit distance.

Values: An array of three real numbers containing the components of the temperature gradient vector ∇T. The temperature gradient component are expressed relative to an orthonormal basis (e1, e2, e3), where e1 is in the direction of a, e2 is in the direction of c╳a, and e3= e1╳e2, where (a, b, c) are the vectors defining cell, which may not be orthogonal. Standard units: watts per square meter (W/m2).

Repeatable: False

Table Min-4: Software SubSubProperties

SubPropertySubSubpropertyDescription
software*name*

Definition: The name of the software package.

Values: A string containing the name of the software package (e.g. “VASP”).

Repeatable: False

version

Definition: The version of the software package.

Values: A string containing the software version (e.g. “6.3.1”)

Repeatable: False

file

Definition: Input or output file for the software package associated with the computation. This includes the input file used to perform the computation, associated data files, and output generated by the software package. Most important are input files to enable reproduction of the dataset.

Values: Contains SubSubSubProperties as defined in the table below.

Repeatable: True

Table Min-5: File SubSubSubProperties

SubSubPropertySubSubSubpropertyDescription
filename*

Definition: The file name.

Values: A string containing the name of the included file.

Repeatable: False

description*

Definition: A brief description of the file and its contents.

Values: A string containing the file description.

Repeatable: False

contents

Definition: The text and/or data contained in the file. Not required in case a link to the file is provided instead.

Values: A string containing the file contents.

Repeatable: False

link

Definition: A URI pointing to a permanent location of the file online.

Values: A string containing the file URI.

Repeatable: False

Example: Minimal MatCore Metadata (Dataset source: https://doi.org/10.60732/8e9bc5b0)

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <creator>
    <name>Albert P. Bartók</name>
    <affiliation>Rutherford Appleton Laboratory</affiliation>
  </creator>
  <creator>
    <name>James Kermode</name>
    <affiliation>University of Warwick</affiliation>
  </creator>
  <creator>
    <name>Noam Bernstein</name>
    <affiliation>Naval Research Laboratory</affiliation>
  </creator>
  <creator>
    <name>Gábor Csányi</name>
    <affiliation>University of Cambridge</affiliation>
  </creator>
  <title>Si_PRX_GAP</title>
  <creation-date>2021-02-22</creation-date>
  <description>The original DFT training data for the general-purpose silicon interatomic potential described in the associated publication. The kinds of configuration that were included were selected based on intuition and past experience to obtain good coverage pertaining to a range of properties.</description>
  <material>
    <phase>crystal</phase>
    <constituent>
      <species>Si</species>
      <concentration>100</concentration>
    </constituent>
    <description>A variety of Si configurations including different phases, surfaces, vacancies, interstitials, dislocations and crack tips.</description>
  </material>
  <computation>
    <method-class>Electronic</method>
    <method>DFT</method>
    <simulation-conditions>
      <type>Equilibrium</type>
    </simulation-conditions>
    <software>
      <name>CASTEP</name>
    </software>
  </computation>
  <matcore-id>mc-ad83jsd3-1</matcore-id>
  <matcore-date>2026-02-14</matcore-date>
  <license>GPL-3.0-only</license>
</root>

Note: The matcore-id format in the example is tentative and subject to change.

DFT Method-Specific MatCore Metadata

The metadata for density functional theory (DFT) data (MatCore-DFT) is tabulated below. This includes the level of theory (exchange-correlation functional, core-electron model, and valence-electron model) and associated parameters. (Note: This standard inherits the metadata in the minimal MatCore standard, adding the properties below.)

Table DFT-1: MatCore-DFT metadata

PropertySubpropertyDescription
xc-functional*

Definition: Approximation used to describe the electron exchange and electron correlation energies

Values: Contains SubProperties as listed below.

Repeatable: True

type*

Definition: The mathematical form of the exchange-correlation functional.

Values: String containing one of the following values:

  • “LDA” – Local Density Approximation
  • “GGA” – Generalized Gradient Approximation
  • “Meta GGA” – Extension to GGA including dependence on Laplacian of electron density
  • “Hybrid” – An approximation for the exchange-correlation (XC) functional that includes a portion of the exact XC from Hartree–Fock theory or machine learning with the rest coming from other sources (ab initio or empirical).
  • “ML” – Machine learning based functional
  • Other user-specified value if none of the above apply.

Repeatable: False

description

Definition: Explanatory text providing more information about the type of the exchange-correlation method used including any relevant identifying information.

Values: A string containing more information about the exchange-correlation type. Examples of possible content by type:

  • LDA: Common parameterizations: VMN, PW92, etc.
  • GGA: Typical choices are PW91, PBE, rPBE, revPBE, BLYP, etc.
  • Meta GGA: Specific named examples include TPSS, SCAN, M06, etc.
  • Hybrid: B3LYP, PBE0, HSE03, HSE06, NeuralXC, DeePKS, …
  • ML: DM21, NNLDA, NNGGA, CIDER, etc.
  • others: BEEF, VdW-corrections (added on top of XC, e.g., D2, D3, D4), etc.

Repeatable: False

xc-parameter

Definition: A fixed variable associated with the specified exchange-correlation functional method.

Values: Contains SubSubProperties as defined in the table below.

Repeatable: True

core-electron-model*

Definition: Model of the electrons used in the calculation.

Values: Contains SubProperties as listed below.

Repeatable: False

type*

Definition: The method used to model the electrons.

Values: String containing one of the following values:

  • “All Electron” – All of the electrons in an atom are modeled explicitly.
  • "LAPW" – "Linearized Augmented Plane Wave"
  • “Pseudopotential” – Valence electrons are treated explicitly, and core electrons are modeled by a simplified potential.
  • “PAW” – Projector Augmented Wave
  • Other user-specified value if none of the above apply.

Repeatable: False

pseudopotential

Definition: For type “Pseudopotential”, or type “PAW”, additional information about the pseudopotentials.

Values: Contains SubSubProperties as defined in the table below.

Repeatable: True

valence-electron-model*

Definition: The elementary functions used for expanding electronic wave functions.

Values: Contains SubProperties as listed below.

Repeatable: False

type*

Definition: The choice of a basis set to expand electronic wave functions.

Values: String containing one of the following values:

  • “Plane waves”
  • "LAPW" – "Linearized Augmented Plane Wave"
  • “Localized orbitals”
  • “PAW”
  • Other user-specified value if none of the above apply.

Repeatable: True

localized-orbital-basis-set

Definition: For type “Localized orbitals”, additional information about the localized orbital basis set.

Values: Contains SubSubProperties as defined in the table below.

Repeatable: True

kinetic-energy-cutoff

Definition: For type not “Localized orbitals”, the maximum kinetic energy of the plane waves included in the calculation of wave functions.

Values: Real number containing the kinetic energy cutoff. Standard units: electron Volt (eV).

Repeatable: False

charge-density-cutoff

Definition: For type not “Localized orbitals”, the energy value used to truncate the plane wave expansion of the electron charge density,

Values: Real number containing the charge density cutoff. Standard units: electron Volt (eV).

Repeatable: False

calculation-physics

Definition: Additional phenomena included beyond the standard DFT calculation.

Values: String containing one of the following values:

  • “Relativistic effects”
  • “Spin polarization”
  • “Noncollinear spin polarization”
  • “On-site correlation” (e.g. DFT+U)
  • “Van der Waals”
  • Other user-specified value if none of the above apply.

Repeatable: True

k-point-mesh

Definition: For periodic systems, a grid of points sampled within the Brillouin zone, used to approximate integrals and calculate electronic properties by discretizing the wavefunction across reciprocal space.

Values: Contains SubProperties as listed below.

Repeatable: False

type*

Definition: The choice of how k-points are distributed on the mesh.

Values: String containing one of the following values:

  • “Monkhorst-Pack”
  • "Gamma centered”
  • “Irregular”
  • Other user-specified value if none of the above apply.

Repeatable: False

number-of-points

Definition: How many k-points are included along each reciprocal lattice vector.

Values: Ordered list of three integer numbers containing the number k-points along the three reciprocal lattice vector directions. Standard units: dimensionless.

Repeatable: False

k-point-spacing

Definition: Separation between k-points along each reciprocal lattice vector direction.

Values: Real number containing the k-point spacing. Standard units: 1/angstrom (1/Å).

Repeatable: False

shift

Definition: Translation of the k-point mesh along the reciprocal lattice vector directions.

Values: Ordered list of three real numbers containing the shifts along the three reciprocal lattice vector directions. Standard units: 1/angstrom (1/Å).

Repeatable: False

smearing-type

Definition: The choice of method for assigning fractional occupancies of electronic states and/or facilitating reciprocal space integrations.

Values: String containing one of the following values:

  • “None”
  • “None - tetrahedron”
  • “None - Blöchl-corrected tetrahedron”
  • “Gaussian”
  • "Fermi”
  • “Methfessel-Paxton”
  • Other user-specified value if none of the above apply.

Repeatable: False

smearing-width

Definition: A parameter that controls the broadening of the electron energy levels near the Fermi level.

Values: Real number containing the smearing width. Standard units: electron Volt (eV).

Repeatable: False

methfessel-paxton-order

Definition: For smearing-type “Methfessel-Paxton”, level of polynomial approximation used in the Methfessel-Paxton smearing method

Values: Integer containing the Methfessel-Paxton order. Standard units: Dimensionless.

Repeatable: False

self-consistent-field-convergence

Definition: Method and termination criteria defining the precision of self-consistent solution of the Kohn-Sham equations.

Values: Contains SubProperties as listed below.

Repeatable: False

method

Definition: The self-consistent field mixing and/or extrapolation scheme for the iterative solution of the Kohn-Sham equations.

Values: String containing one of the following values:

  • “Simple mixing”
  • “Pulay mixing”
  • “Blocked Davidson”
  • “DIIS” – Direct Inversion in the Iterative Subspace
  • “BFGS” – Broyden–Fletcher–Goldfarb–Shanno algorithm
  • Other user-specified value if none of the above apply.

Repeatable: False

tolerance*

Definition: Convergence criterion for the self-consistent loop.

Values: Real number containing the convergence tolerance

Standard units: electron Volt (eV).

Repeatable: False

relaxation-tolerance-energy

Definition: Total energy convergence criterion for atomic relaxation.

Values: Real number containing the convergence tolerance

Standard units: electron Volt (eV).

Repeatable: False

relaxation-tolerance-forces

Definition: Maximum force convergence criterion for atomic relaxation.

Values: Real number containing the convergence tolerance

Standard units: electron Volt (eV/Å).

Repeatable: False

comment

Definition: Additional information on the DFT calculation. For example, non-self-consistent constraints, such as details about imposed state occupation or moment freezing.

Values: String containing the calculation comments.

Repeatable: True

Table DFT-2: Exchange-correlation parameter SubSubProperties

SubPropertySubSubpropertyDescription
xc-parametername*

Definition: The designation of the exchange-correlation parameter.

Values: A string containing the name of an exchange-correlation parameter.

Repeatable: False

value*

Definition: The data associated with the exchange-correlation parameter.

Values: An integer, real, boolean, or string containing the parameter value.

Repeatable: False

unit*

Definition: A standardized string or identifier that defines the dimension or measurement system of the associated value.

Values: A string containing the units of the value in the GNU unit convention (or the text “dimensionless”).

Repeatable: False

description

Definition: An explanation of the meaning, source, or purpose of the parameter.

Values: A string explaining the exchange-correlation parameter. This can be the mixing parameter for a hybrid exchange-correlation, or other method-specific parameters.

Repeatable: False

Table DFT-3: Pseudopotential SubSubProperties

SubPropertySubSubpropertyDescription
pseudopotentialname*

Definition: The name of the pseudopotential.

Values: A string containing the pseudopotential name.

Repeatable: False

description

Definition: Explanatory text about the pseudopotential including any relevant identifying information.

Values: A string containing the pseudopotential description.

Repeatable: False

type*

Definition: The physical form of the pseudopotential.

Values: String containing one of the following values:

  • “Norm conserving”
  • “Ultrasoft”
  • Other user-specified value if none of the above apply.

Repeatable: False

number-of-valence-electrons*

Definition: The number of electrons treated explicitly.

Values: An integer containing the number of valence electrons. Standard units: dimensionless.

Repeatable: False

repository

Definition: The database from which the pseudopotential was obtained.

Values: String containing the name of the repository.

Repeatable: False

version

Definition: The version of the pseudopotential.

Values: A string containing the pseudopotential version (e.g. “6.3.1”)

Repeatable: False

doi

Definition: The digital object identifier (DOI) for the pseudopotential.

Values: A string containing the pseudopotential DOI.

Repeatable: False

unique-identifier

Definition: A hash or other token providing a digital signature for the pseudopotential.

Values: A string containing the pseudopotential identifier (e.g. a SHA256 hash for a VASP pseudopotential).

Repeatable: False

Table DFT-4: Localized orbitals basis set SubSubProperties

SubPropertySubSubpropertyDescription
localized-orbitals-basis-settype*

Definition: The designation of the set of orbital functions used for expanding electronic wave functions.

Values: String containing the basis set name (e.g. “6-31G”, “DZP”, etc.).

Repeatable: False

repository

Definition: The database from which the localized-orbital basis set was obtained.

Values: String containing the name of the repository.

Repeatable: False

version

Definition: The version of the localized-orbital basis set.

Values: A string containing the localized-orbital basis set version.

Repeatable: False

doi

Definition: The digital object identifier (DOI) for the localized-orbital basis set.

Values: A string containing the pseudopotential DOI.

Repeatable: False

unique-identifier

Definition: A hash or other token providing a digital signature for the localized-orbital basis set .

Values: A string containing the localized-orbital basis set identifier.

Repeatable: False

Example: DFT Method-Specific Metadata (Dataset source: https://doi.org/10.60732/8e9bc5b0)

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <xc-functional>
    <type>GGA</type>
    <description>PW91 functional (the choice is motivated by the existence of a large-scale simulation of the melting point with this functional)</description>
  </xc-functional>
  <core-electron-model>
    <type>Pseudopotential</type>
    <pseudopotential>
      <name>Ultrasoft Pseudopotentials (USP)</name>
      <type>Ultrasoft</type>
      <number-of-valence-electrons>4</number-of-valence-electrons>
    </pseudopotential>
  </core-electron-model>
  <valence-electron-model>
    <type>Plane waves</type>
    <kinetic-energy-cutoff>250.0</kinetic-energy-cutoff>
  </valence-electron-model>
  <k-point-mesh>
    <type>Monkhorst-Pack</type>
    <k-point-spacing>0.03</k-point-spacing>
    <smearing-width>0.05</smearing-width>
  </k-point-mesh>
</root>

MD Method-Specific MatCore Metadata

The metadata for classical molecular dynamics (MD) data (MatCore-MD) is tabulated below. This category includes energy and free energy minimization, equilibrium and nonequilibrium MD, and related approaches. (Note: This standard inherits the metadata in the minimal MatCore standard, adding the properties below.)

Table MD-1: MatCore-MD metadata

PropertySubpropertyDescription
computation*

Definition: The form of molecular dynamics performed.

Values: Contains SubProperties as listed below.

Repeatable: True

mode*

Definition: The type of the molecular dynamics simulation performed.

Values: String containing one of the following values:

  • “Static” – Snapshot(s) of arbitrary particle configuration
  • “Minimization” – Energy minimization
  • “Equilibrium dynamics” – Equilibrium molecular dynamics
  • “Nonequilibrium dynamics” – Nonequilibrium molecular dynamics.
  • “Free energy” – Free energy calculations
  • Other user-specified value if none of the above apply.

Repeatable: False

algorithm

Definition: The computational method used to perform the computation.

Values: String containing one of the following values based on the specified calculation type::

  • For “Minimization”:
    • “Simplex”
    • “Damped dynamics”
    • “FIRE”
    • “Steepest descent”
    • “Conjugate gradients”
    • “BFGS”
    • “Simulated annealing”
    • “SGLD” – Stochastic Gradient Langevin Dynamics
    • Other user-specified value if none of the above apply.
  • For “Equilibrium dynamics” or “nonequilibrium-dynamics”:
    • “Leap frog”
    • “Runge-Kutta”
    • “Velocity Verlet”
    • “Verlet”
    • Other user-specified value if none of the above apply.
  • For “Free energy”:
    • “Harmonic”
    • “Metadynamics”
    • “PMF” – Potential of Mean Force
    • “Thermodynamic integration”
    • “Umbrella sampling”
    • Other user-specified value if none of the above apply.

Repeatable: False

computation-parameter

Definition: A fixed variable associated with the specified computation algorithm.

Values: Contains SubSubProperties as defined in the table below.

Repeatable: True

initialization

Definition: Information on the conditions set at the start of the simulation.

Values: String containing a description of the simulation initial conditions (such as the nature and source of an initial structure, initialization of velocities or spins, etc.).

Repeatable: False

particle-style*

Definition: The nature of the discrete entities used in the molecular dynamics simulation.

Values: String containing one of the following values:

  • “Atom” – Each particle represents an individual atom
  • “Radical” – Each particle represents a free radical
  • “United atom” – Each particle represents a fixed grouping of atoms
  • “Bead” – Each particle represents a user-tuneable group of atoms
  • “Coarse grained” – Each particle represents a higher scale particle obtained by scaling up the atomic representation
  • “Mesoscale” – Granular particles used to simulate microstructure at the mesoscopic scale
  • “Classical electron” – Antisymmetrized wave packet representation of valence electrons
  • Other user-specified value if none of the above apply.

Repeatable: False

particle-interactions*

Definition: The model used to describe the potential energy surface of the system.

Values: Contains SubProperties as listed below.

Repeatable: True

Note: This field can be repeated in cases where there are multiple interaction models used in the same simulation (e.g. hybrid calculations combining multiple models or QM/MM style computations).

model-type*

Definition: The kind of potential used to compute the particle interactions.

Values: A string containing the model type (e.g. Lennard-Jones, EAM, MACE, etc.).

Repeatable: False

bonding-type*

Definition: Specifies whether the bonds between particles are immutable or can be broken.

Values: String containing one of the following values:

  • “Reactive” – Bonds formed or broken on-the-fly based on the particle environment
  • “Bonded fixed” – Bonds defined a priori and cannot be broken during the simulation.
  • “Bonded mutable” – Bonds defined a priori but can be broken and/or formed during the simulation.
  • Other user-specified value if none of the above apply.

Repeatable: False

theory-level*

Definition: The rigor with which the atomic interactions are modeled.

Values: String containing one of the following values:

  • “Classical physics-based” – Electronic degrees of freedom subsumed into a model of atomic bonding
  • “Classical machine-learning” – Machine learning model trained to take particle positions as input
  • “Tight-binding” – Classical model of nuclear interactions with approximate treatment of the electrons
  • “Ab initio” – Some level of electronic structure determines the particle interactions
  • “Ab initio machine learning” – Machine learning model employing some level of electronic structure in determining the particle interactions
  • Other user-specified value if none of the above apply.

Repeatable: False

source

Definition: Information that uniquely identifies the origin of the particle interaction model (e.g. a journal article, or a repository containing an implementation or parameter set).

Values: Contains SubSubProperties as defined in the table below.

Repeatable: True

charge-interactions

Definition: The model used to describe the interactions between charged particles.

Values: Contains SubProperties as listed below.

Repeatable: True

charge-origin*

Definition: Nature of the particle charges.

Values: String containing one of the following values:

  • “Semi empirical” – Charges obtained from semi-empirical quantum calculations
  • “Quantum” – Charges obtained from quantum calculations
  • “Electronic population” – Charges obtained from electronic population analysis, e.g. Mulliken
  • “Electronegativity” – Charges obtained from electronegativity differences between atoms
  • “Tabulated” – Charges obtained from an external source
  • Other user-specified value if none of the above apply.

Repeatable: False

charge-style*

Definition: The nature of the charges associated with the particles, such as total or effective charges.

Values: String containing one of the following values:

  • “Coulomb full” – The particle charge accounting for all electrons
  • “Coulomb screened” – Effective charge due to screening or shielding by other electrons (e.g. Slater)
  • Other user-specified value if none of the above apply.

Repeatable: False

charge-variability*

Definition: Attribute specifying the method for dynamically varying the local electric charge associated with discrete entities.

Values: String containing one of the following values:

  • “Fixed” – Charges cannot change during the simulation
  • “QEq” – Charge equilibration method
  • “EEM” – Electronegativity Equalization Method
  • “iEL/SCF” – Extended Lagrangian/self-consistent field scheme
  • “FlucQ” – Fluctuating charge potential
  • “Drude” – Drude method
  • Other user-specified value if none of the above apply.

Repeatable: False

long-range-electrostatics

Definition: Method used to compute long-range electrostatic interactions.

Values: String containing one of the following values:

  • “None” – No electrostatic calculation performed
  • “Direct” – All Coulomb interactions computed directly
  • “Ewald” – Ewald summation
  • “PME” – Particle mesh Ewald
  • “PPPM” – Particle particle particle mesh
  • “Reaction” – Reaction field
  • Other user-specified value if none of the above apply.

Repeatable: False

electric-dipole

Definition: Attribute specifying the type of interaction associated with the local electric dipoles associated with particles.

Values: String containing one of the following values:

  • “Stockmayer” – Stockmayer potential
  • “Screened” – Screened dipoles
  • “Long range” – Long-range dipoles
  • Other user-specified value if none of the above apply.

Repeatable: False

electric-dipole-variability

Definition: Attribute specifying the method for dynamically varying the local electric charge associated with discrete entities.

Values: String containing one of the following values:

  • “Induced” – Induced point dipoles
  • “Isotropic” – Isotropic polarizability
  • “Anisotropic” – Anisotropic polarizability
  • Other user-specified value if none of the above apply.

Repeatable: False

magnetic-interactions

Definition: The model used to describe the interactions between magnetic spins.

Values: Contains SubProperties as listed below.

Repeatable: True

spin-style*

Definition: The nature of the magnetic spins associated with the particles.

Values: String containing one of the following values:

  • “Classical spin dynamics” – Spins modeled as classical vectors evolved via Landau–Lifshitz–Gilbert (LLG) equation defined by a spin Hamiltonian.
  • “Coupled spinlattice dynamics” – Combines classical spin-dynamics with classical molecular dynamics“Ab initio spin dynamics” – Magnetism treated via spin-polarized density functional theory (DFT)
  • “Tight binding spin dynamics” – Magnetism emerges from itinerant electrons rather than fixed local moments
  • “Quantum spin dynamics” – Magnetic moments treated as fully quantum mechanical operators evolving under the time-dependent Schrödinger equation

Repeatable: False

magnetic-dipole

Definition: Attribute specifying the type of interaction (spin Hamiltonian) between the magnetic dipoles associated with particles.

Values: String containing a description of the approach used to model the magnetic dipoles (e.g. “Heisenberg”, “multi-spin”, “Zeeman”, or some other formulation for the spin Hamiltonian).

Repeatable: False

thermodynamic-constraint

Definition: Macroscopic physical restriction imposed on the simulation, such as a statistical mechanics ensemble.

Values: Contains SubProperties as listed below.

Repeatable: True

system*

Definition: The collection of particles to which the constraint is applied.

Values: String describing the constrained system (e.g. entire system, or a subset of atoms that are constrained.)

Repeatable: False

type*

Definition: The nature of the imposed thermodynamic constraint.

Values: String containing one of the following values:

  • “NVE” – Microcanonical
  • “NVT” – Canonical
  • “μVT” – Grand canonical
  • “NPT” – Isothermal-isobaric
  • “NσT” – Isothermal-isostress
  • “μPT” – Grand isothermal-isobaric
  • “μσT” – Grand isothermal-isobaric
  • “NPH” – Isoenthalpic-isobaric
  • “NσH” – Isoenthalpic-isostress
  • Other user-specified value if none of the above apply.

Repeatable: False

method

Definition: The name of the approach used to impose the constraint.

Values: String containing one of the following values based on the specified constraint::

  • For “NVT”:
    • “Andersen”
    • “Berendsen”
    • “CSVR”
    • Langevin
    • “Nosé-Hoover”
    • “Nosé-Hoover chain”
    • “Velocity rescaling”
    • Other user-specified value if none of the above apply.
  • For “μVT”:
    • “GCMC” – Grand Canonical Monte Carlo
    • Other user-specified value if none of the above apply.
  • For “NPT”:
    • “Andersen”
    • “Berendsen”
    • “CauchyStat”
    • “Nosé-Hoover style”
    • “Parrinello-Rahman”
    • Other user-specified value if none of the above apply.
  • For “NσT”:
    • “CauchyStat”
    • “Parrinello-Rahman”
    • Other user-specified value if none of the above apply.
  • For “μPT”:
    • User-specified value.
  • For “μσT”:
    • User-specified value.
  • For “NPH”:
    • “Nosé-Hoover style”
    • Other user-specified value if none of the above apply.
  • For “NσH”:
    • User-specified value.
  • Otherwise:
    • User-specified value.

Repeatable: False

description

Definition: Explanatory text about the constraint and method used to impose it.

Values: A string containing the constraint description.

Repeatable: False

td-parameter

Definition: A fixed variable associated with the specified thermodynamic constraint.

Values: Contains SubSubProperties as defined in the table below.

Repeatable: True

Table MD-2: Computation parameter SubSubProperties

SubPropertySubSubpropertyDescription
computation-parametername*

Definition: The designation of the parameter associated with the specified computation algorithm.

Values: A string containing the name of the computation parameter.

Repeatable: False

value*

Definition: The data associated with the computation parameter.

Values: An integer, real, boolean, or string containing the computation parameter value.

Repeatable: False

unit*

Definition: A standardized string or identifier that defines the dimension or measurement system of the associated value.

Values: A string containing the units of the value in the GNU unit convention (or the text “dimensionless”).

Repeatable: False

description

Definition: An explanation of the meaning, source, or purpose of the computation parameter.

Values: A string explaining the computation parameter (e.g. “The convergence tolerance for energy minimization”, “The simulation time for the dynamical simulation”).

Repeatable: False

Table MD-3: The source of the particle interaction model SubSubProperties

source

Definition: Information that uniquely identifies the origin of the particle interaction model (e.g. a journal article, or a repository containing an implementation or parameter set).

Values: Contains SubProperties as listed below.

Repeatable: True

reference*

Definition: The origin of the particle interaction model implementation used, such as an interatomic potential repository (e.g. OpenKIM or the NIST IPR), a code (like LAMMPS), or a publication.

Values: String containing the name of the source and other identifying information (e.g. journal citation, repository and unique identifier in the repository)

Repeatable: False

doi

Definition: The digital object identifier (DOI) for the source.

Values: A string containing the source DOI.

Repeatable: False

link

Definition: A URI pointing to a permanent location of the source.

Values: A string containing the file URI.

Repeatable: False

MD-4: Thermodynamic constraint parameter SubSubProperties

SubPropertySubSubpropertyDescription
td-parametername*

Definition: The designation of the parameter associated with the specified thermodynamic constraint.

Values: A string containing the name of the thermodynamic constraint parameter.

Repeatable: False

value*

Definition: The data associated with the thermodynamic constraint parameter.

Values: An integer, real, boolean, or string containing the thermodynamic constraint parameter value.

Repeatable: False

unit*

Definition: A standardized string or identifier that defines the dimension or measurement system of the associated value.

Values: A string containing the units of the value in the GNU unit convention (or the text “dimensionless”).

Repeatable: False

description

Definition: An explanation of the meaning, source, or purpose of the thermodynamic constraint parameter.

Values: A string explaining the thermodynamic constraint parameter (e.g. “The damping parameter for the Langevin thermostat”).

Repeatable: False

Example: MD Method-Specific Metadata (Dataset source: https://doi.org/10.6084/m9.figshare.6273170)

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <computation>
    <mode>equilibrium-dynamics</mode>
  </computation>
  <particle-style>atom</particle-style>
  <particle-interactions>
    <model-type>Finnis-Sinclair potential</model-type>
    <bonding-type>Reactive</bonding-type>
    <theory-level>Classical physics-based</theory-level>
  </particle-interactions>
  <thermodynamic-constraint>
    <system>Entire system</system>
    <type>NPT</type>
    <description>Amorphous samples were produced by quenching the molten metal from 2300 to 300 K.</description>
  </thermodynamic-constraint>
</root>

MBPT Method-Specific MatCore Metadata

The metadata for Many-Body Perturbation Theory (MBPT) data (MatCore-MBPT) is tabulated below. This includes the MBPT method, starting point, dielectric treatment, and associated parameters. (Note: This standard inherits the metadata in the minimal MatCore standard and MatCore-DFT standard, adding the properties below.)

Table MBPT-1: MatCore-MBPT metadata

PropertySubpropertyDescription
mbpt-method*

Definition: The many-body perturbation theory (MBPT) approach used and associated settings.

Values: Contains SubProperties as listed below.

Repeatable: False

type*

Definition: The nature of the MBPT calculation performed.

Values: String containing one of the following values:

  • “GW”
  • “BSE”
  • “GW/BSE”
  • Other user-specified value if none of the above apply.

Repeatable: False

self-consistency*

Definition: Specifies the level of iterative updating applied to the Green’s function (G), the screened Coulomb interaction (W), or the excitonic Hamiltonian during the calculation. This property identifies whether the MBPT equations are solved in a 'one-shot' perturbative manner or through an iterative cycle where the electronic or excitonic quasiparticle states are updated to reach a self-consistent solution.

Values: String containing one of the following values depending on the variant of the GW, BSE, or GW/BSE approach used:

  • “G0W0” – One-shot calculation; the G and W are constructed using the starting eigenvalues and wavefunctions.
  • “GW0” – Only the Green's function G is updated iteratively, while W remains fixed at the W0 level
  • “G0W” – Only the screened interaction W is updated iteratively, while G remains fixed at G0
  • “scGW” – Fully self-consistent GW; bothG and W are updated iteratively until the quasiparticle energies and/or wavefunctions converge
  • “QSGW” – Quasiparticle self-consistent GW; the starting potential is updated to provide the optimal independent particle basis for the GW expansion
  • “BSE0” – One-shot BSE; the excitonic Hamiltonian is built once using fixed GW quasiparticle energies and screening (W)
  • “scBSE” – Self-consistent BSE; internal refinement of the kernel or energies within the BSE process“evGW+BSE” – Eigenvalue-only self-consistent GW (evGW) used as the starting point for a subsequent BSE calculation
  • “scGW+BSE” – The BSE is solved using the results of a fully self-consistent GW calculation
  • Other user-specified value if none of the above apply.

Repeatable: False

starting-point*

Definition: The density functional theory (DFT) exchange-correlation functional used to generate Kohn-Sham wavefunctions and eigenvalues for the Green’s function and the screened Coulomb interaction.

Values: String containing one of the following values:

  • “LDA”
  • “GGA”
  • “Meta GGA”
  • “Hybrid GGA”
  • “DFT+U”
  • Other user-specified value if none of the above apply.

Repeatable: False

dielectric-matrix*

Definition: Information on the dielectric matrix used to compute the screened Coulomb interaction W.

Values: Contains SubProperties as listed below.

Repeatable: False

planewave-basis-cutoff*

Definition: For a planewave code, the basis set cutoff employed for the dielectric matrix. This is the same as the kinetic energy cutoff.

Values: A real number containing the cutoff. Standard units: Rydberg (Ry).

Repeatable: False

Note: Specify either “basis-cutoff-planewave” or “local-orbital-basis-set”, but not both.

local-orbital-basis-set*

Definition: For local orbital codes, the name of the basis set .

Values: A string containing the name of the atomic orbital basis. (e.g. “cc-pV6Z”).

Repeatable: False

Note: Specify either “basis-cutoff-planewave” or “local-orbital-basis-set”, but not both.

frequency

Definition: The method used to handle the frequency dependence of the Green’s function, the screened Coulomb interaction, and their product.

Values: String containing one of the following values:

  • “Hybertsen-Louie”
  • “Godby-Needs”
  • “Full frequency real axis”
  • “Full frequency imaginary axis”
  • “Contour deformation”
  • “Spacetime”
  • Other user-specified value if none of the above apply.

Repeatable: False

response-basis-size

Definition: For sum-over-states methods, the number of bands used in the sum. For linear-response methods, the number of dielectric eigenvalues.

Values: One integer containing the number of bands or eigenvalues. Standard units: dimensionless.

Repeatable: False

q-points*

Definition: Brillouin zone grid used for sampling the dielectric matrix.

Values: Ordered list of 3 integer numbers containing the number points along the three reciprocal lattice vector directions. Standard units: dimensionless.

Repeatable: False

coulomb-truncation

Definition: Method used to eliminate long-range Coulomb interactions between periodic replicas in supercell calculations.

Values: String containing one of the following values:

  • “Ismail-Beigi”
  • “Rozzi”
  • “Spencer-Alavi”
  • Other user-specified value if none of the above apply.

Repeatable: False

gw-bands

Definition: Number of bands for which the GW correction is evaluated.

Values: One integer specifying how many bands are corrected, counting from the bottom of the valence bands. Standard units: dimensionless.

Repeatable: False

bse-hamiltonian

Definition: A collective descriptor for the parameters defining the construction, dimensionality, and solution method of the effective excitonic Bethe-Salpeter Hamiltonian that is being solved in a BSE calculation. It characterizes the Hilbert space size and the physical constraints (momentum and spin) applied during the diagonalization

Values: Contains SubProperties as listed below.

Repeatable: False

number-valence-bands

Definition: Number of valence bands included in the BSE Hamiltonian.

Values: One integer specifying how many valence bands are corrected, counting from the valence band top. Standard units: dimensionless.

Repeatable: False

number-conduction-bands

Definition: Number of conduction bands included in the BSE Hamiltonian.

Values: One integer specifying how many bands are corrected, counting from the conduction band bottom. Standard units: dimensionless.

Repeatable: False

k-point-mesh

Definition: Brillouin zone grid used for building the BSE Hamiltonian.

Values: Ordered list of 3 integer numbers containing the number points along the reciprocal lattice vector directions. Standard units: dimensionless.

Repeatable: False

exciton-momentum

Definition: Crystal momentum of the exciton in the BSE calculation.

Values: Ordered list of 3 real values containing the crystal momentum, e.g. [0.5,0.5,1.0]. Standard units: Reciprocal lattice units (2π/a).

Repeatable: False

exciton-multiplicity

Definition: Specification whether the BSE calculation is performed for singlet or triplet excitons.

Values: string containing one of the values:

  • “Singlet”
  • “Triplet”
  • Other user-specified value if none of the above apply.

Repeatable: False

diagonalization

Definition: Approximation method used to diagonalize the BSE Hamiltonian.

Values: String containing one of the following values:

  • “Tamm-Dancoff”
  • “Full diagonalization”
  • Other user-specified value if none of the above apply.

Repeatable: False

number-lowest-eigenvalues

Definition: Number of lowest eigenvalues computed by solving the BSE Hamiltonian.

Values: One integer containing the number of BSE eigenvalues. Standard units: dimensionless.

Repeatable: False

bse-kernel-truncation

Definition: Method used to eliminate long-range Coulomb interactions between periodic replicas in the BSE Coulomb kernel.

Values: String containing one of the following values:

  • “Ismail-Beigi”
  • “Rozzi”
  • “Spencer-Alavi”
  • Other user-specified value if none of the above apply.

Repeatable: False

Example: MBPT Method-Specific Metadata (Dataset source: https://archive.materialscloud.org/record/2023.92)

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <mbpt-method>
    <type>GW</type>
    <self-consistency>G0W0</self-consistency>
  </mbpt-method>
  <starting-point>GGA</starting-point>
  <dielectric-matrix>
    <planewave-basis-cutoff>60.0</planewave-basis-cutoff>
    <q-points>48</q-points>
  </dielectric-matrix>
</root>

ML Method-Specific MatCore Metadata

The metadata for Machine Learning (ML) data (MatCore-ML) is tabulated below. This metadata documents key aspects of the ML method to provide understanding of the dataset generated by it. (Note: This standard inherits the metadata in the minimal MatCore standard, adding the properties below.)

Table ML-1: MatCore-ML metadata

PropertySubpropertyDescription
ml-task*

Definition: Characterization of the machine learning process performed.

Values: Contains SubProperties as listed below.

Repeatable: True

type*

Definition: The nature of the machine learning task.

Values: String containing one of the following values:

  • “Property prediction”
  • “Structure prediction” (e.g. crystal structure prediction)
  • “Structure generation” (e.g. de novo crystal generation)
  • “Synthesis prediction”
  • “Material ranking”
  • “Clustering”
  • “Embedding”
  • Other user-specified value if none of the above apply.

Repeatable: True

Note: This Property can be repeated if the machine learning task involves multiple types of calculations.

description

Definition: Explanation of the machine learning task performed.

Values: A string containing a more detailed explanation of the machine learning task and its objectives.

Repeatable: False

ml-model*

Definition: A comprehensive description of the machine learning approach and architecture used.

Values: Contains SubProperties as listed below.

Repeatable: False

algorithm*

Definition: The machine learning method used.

Values: A string containing a description of the machine learning algorithm used (e.g. neural network, random forest, Gaussian process, etc.)

Repeatable: False

target-variable*

Definition: The desired output or answer the machine learning model aims to predict.

Values: A string containing a description of a target variable the machine learning model aims to predict.

Repeatable: True

input-feature

Definition: Information provided to the machine learning method.

Values: A string containing a description of the form and meaning of an input variable/feature provided to the machine learning model (e.g. a graph representation of a crystal structure). This can include information on feature engineering performed to define the input (e.g. dimensionality reduction, clustering, etc.).

Repeatable: True

model-architecture

Definition: A description of the network topology used by the specified algorithm.

Values: A string containing a description of the model architecture used by the specified algorithm (e.g. for a neural network: the number of layers, nodes per layer, etc.).

Repeatable: False

model-size

Definition: A measure of the magnitude of the machine learning architecture.

Values: A string containing a description of the size of the trained model (e.g. number of parameters, storage space, memory requirements, etc.)

Repeatable: False

training-data

Definition: The information used for parameterizing the machine learning model.

Values: Contains SubProperties as listed below.

Repeatable: True

name*

Definition: Name of the training dataset used to train in the machine learning model.

Values: String containing the dataset name.

Repeatable: False

contents*

Definition: Description of the information included in the training dataset.

Values: String containing a description of the training data (e.g. DFT calculations, experiments, etc.)

Repeatable: False

source

Definition: Location from which the training data was obtained.

Values: A string containing a MatCore ID, web address, DOI, citation, or other description of the source of the training data.

Repeatable: False.

size

Definition: Number of items in the training dataset.

Values: A string containing information on the number of data points, configurations, etc., in the training set.

Repeatable: False

data-preprocessing

Definition: Approach used for preliminary refining of raw, unformatted, or incomplete information into a structured and clean state suitable for modeling or analysis.

Values: A string containing a description of any preprocessing applied to the dataset to prepare it for training the machine learning model.

Repeatable: False

missing-data

Definition: Documentation of incomplete, null, or unrecorded entries within a feature set that hinder algorithm training.

Values: A description of missing entries in the raw dataset (e.g. percentage of missing data and explanation of which fields are missing and why).

Repeatable: False

missing-data-strategy

Definition: Approach used for filling in or removing blank, null, or unknown entries in a dataset.

Values: A string describing the approach used for handling missing data.

Repeatable: False

outlier-handling

Definition: Approach to identifying and addressing extreme observations that deviate significantly from the general distribution to ensure they do not disproportionately skew model training.

Values: A string containing the method used to detect and handle outliers.

Repeatable: False

dataset-split

Definition: A description of how information used to train the machine learning model was divided into subsets as part of the training process.

Values: A string containing information about how the dataset was split into train/validation/test ratios, and any other relevant information related to how the dataset was distributed in the training process.

Repeatable: False

training-method

Definition: The approach used to determine the machine learning model’s free parameters.

Values: Contains SubProperties as listed below.

Repeatable: False

training-procedure*

Definition: The methodology used to determine the machine learning model’s free parameters.

Values: A string containing a description of the procedure used to train the machine learning model.

Repeatable: False

training-hyperparameters

Definition: Documentation of the settings used in fitting the model.

Values: A string describing the hyperparameters used in the training of the model (e.g. learning rate, early-stopping, etc.)

Repeatable: False

transfer-learning

Definition: Method used to migrate knowledge from previously mastered objectives to accelerate proficiency for the current machine learning task.

Values: A string containing details of the transfer learning methods used (such as fine-tuning, domain adaptation, multi-task learning, delta learning, etc.).

Repeatable: False

model-performance

Definition: An assessment of the reliability of the machine learning results.

Values: Contains SubProperties as listed below.

Repeatable: False

validation*

Definition: A description of the approach used to determine the reliability of the obtained machine learning results.

Values: A string containing a description of the validation method used (e.g. k-fold cross-validation, hold-out set, etc.) and associated validation measure (e.g. MAE, RMSE, R^2, etc.).

Repeatable: True

uncertainty-quantification

Definition: The method or process used to identify and measure the range of possible outcomes to assess the reliability of machine learning model predictions.

Values: A string containing a description of the method used to assess uncertainty of the machine learning method outputs (e.g. Bayesian methods, dropout, ensemble methods, Gaussian process, etc.)

Repeatable: False

interpretability-method

Definition: A technique used to understand the cause-and-effect relationships within the machine learning model's inputs and outputs.

Values: A string containing details of the approach and results of the interpretability analysis (e.g. feature importance, global surrogate models, LIME (Local Interpretable Model-agnostic Explanations), SHAP (SHapley Additive exPlanations), Saliency Maps, etc.)

Repeatable: False

Example: ML Method-Specific Metadata (Source: https://figshare.com/articles/dataset/WyFormer_generated_structures/29094701)

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <ml-task>
    <type>structure prediction</type>
    <description>Generation of inorganic crystal structures using a method that explicitly accounts for crystal symmetry.</description>
  </ml-task>
  <ml-model>
    <algorithm>Wyckoff Transformer (WyFormer)</algorithm>
    <target-variable>Discrete Wyckoff encodings consisting of a space group identifier and sequence of element, site-symmetry, and enumeration tokens, which can be deterministically expanded into full 3D crystal structures.</target-variable>
    <input-feature>Tokenized representations of the space group, chemical elements, site symmetry labels, and Wyckoff position indices derived from symmetry analysis of crystal structures.</input-feature>
    <model-architecture>A permutation-invariant autoregressive Transformer that generates crystal structures in a discrete Wyckoff representation by modeling the joint distribution of space group and symmetry-constrained site tokens.</model-architecture>
  </ml-model>
  <training-data>
    <name>MP-20</name>
    <contents>Inorganic crystalline materials relaxed using density functional theory (DFT) to a local energy minimum. There are 89 elements and the materials have 1–20 atoms in the unit cells. MP-20 includes most experimentally known materials with no more than 20 atoms in unit cell.</contents>
    <source>Materials Project, first published in https://doi.org/10.1063/1.4812323</source>
    <size>45,231 structures that differ in both structure and composition.</size>
  </training-data>
</root>

PF Method-Specific MatCore Metadata

The metadata for Phase Field (PF) data (MatCore-PF) is tabulated below. This metadata documents key aspects of the PF method to provide understanding of the dataset generated by it. This includes the fundamental mathematical and physical framework governing the PF simulation, including the governing equations, thermodynamic free energy models, and time-evolution state. (Note: This standard inherits the metadata in the minimal MatCore standard, adding the properties below.)

Table PF-1: MatCore-PF metadata

PropertySubpropertyDescription
physical-phenomena*

Definition: Material process being modeled; description of the overarching processes and mechanisms being modeled/simulated.

Values: String containing one of the following values:

  • "Isothermal solidification"
  • "Directional solidification"
  • "Isothermal annealing"
  • “Sintering”
  • “Twinning”
  • “Grain growth”
  • “Spinodal decomposition”
  • Other user-specified value if none of the above apply.

Repeatable: True

problem-specification*

Definition: The fundamental mathematical and physical framework governing the phase field simulation, including the governing equations, thermodynamic free energy models, and time-evolution state

Values: Contains SubProperties as listed below.

Repeatable: False

time-dependence*

Definition: Description of the time-evolution state targeted by the simulation.

Values: String containing one of the following values:

  • “Property prediction”
  • "Transient evolution"
  • "Steady state"
  • "Stationary"
  • "Thermodynamic equilibrium"
  • Other user-specified value if none of the above apply.

Repeatable: False

governing-equation*

Definition: The specific phase field equation used to evolve a field variable.

Values: Contains SubSubProperties as defined in the table below.

Repeatable: True

free-energy*

Definition: The thermodynamic free energy function used in the system

Values: Contains SubSubProperties as defined in the table below.

Repeatable: False

additional-energy-contribution

Definition: Physical contributions to the free energy function.

Values: String containing one of the following values:

  • "Elasticity"
  • "Plasticity"
  • "Electrical"
  • "Magnetic"
  • "Fluid-Solid-Interaction"
  • "Solid-Solid-Interaction"
  • Other user-specified value if none of the above apply.

Repeatable: True

domain-and-mesh

Definition: The spatial configuration, computational region, and topological discretization details.

Values: Contains SubProperties as listed below.

Repeatable: False

dimensionality*

Definition: The spatial dimensions of the computational domain.

Values: String containing one of the following values:

  • "1D"
  • "2D"
  • "3D"
  • Other user-specified value if none of the above apply.

Repeatable: False

coordinate-system

Definition: The structured framework used to uniquely determine and standardize the position of the points in space and the symmetry assumed.

Values: String containing one of the following values:

  • "Cartesian"
  • "Cylindrical"
  • "Spherical"
  • Other user-specified value if none of the above apply.

Repeatable: False

mesh-type

Definition: The topology or distribution of the discrete grid points.

Values: String containing one of the following values:

  • "Regular"
  • "Cartesian"
  • "Rectilinear"
  • "Skewed"
  • "Curvilinear"
  • "Unstructured"
  • "Adaptive Mesh Refinement"
  • “Meshless”
  • Other user-specified value if none of the above apply.

Repeatable: False

domain-size

Definition: Specification of the computational region extents.

Values: An ordered list of real numbers containing the domain dimensions in meters (m).

Repeatable: False

boundary-conditions

Definition: Constraints applied to the computational domain perimeter.

Values: String containing one of the following values:

  • "NoFlux"
  • "Periodic"
  • "Fixed"
  • "Free"
  • "Dirichlet"
  • "Neumann"
  • Other user-specified value if none of the above apply.

Repeatable: True

initial-conditions

Definition: The spatial distribution of the field variables at the start of the simulation.

Values: A string describing the initialization method (e.g., "Uniform constant", "Random noise", "Geometric seed", "Imported array", a specific mathematical function, or other user-specified method)

Repeatable: True

numerical-method*

Definition: A description of the computational framework used to solve the phase field equations.

Values: Contains SubProperties as listed below.

Repeatable: True

spatial-discretization*

Definition: The numerical approach used to convert continuous governing equations into a system of algebraic equations defined over a discrete mesh or set of points within the computational domain.

Values: String containing one of the following values:

  • "FDM" – Finite Difference Method
  • "FEM" – Finite Element Method
  • "FVM" – Finite Volume Method
  • "LBM" – Lattice Boltzmann Method
  • "Particle"
  • "Spectral"
  • Other user-specified value if none of the above apply.

Repeatable: False

spatial-accuracy-order

Definition: The exponent of the grid spacing parameter () determining the rate at which the discretization error approaches zero as the mesh is refined.

Values: An integer containing the order of accuracy. Standard units: dimensionless.

Repeatable: False

temporal-discretization

Definition: The algorithm, step-size approach, and integration order used to advance a simulation, governing equation in time.

Values: String containing one of the following values:

  • "Forward Euler"
  • "Backward Euler"
  • "Adams Bashforth"
  • "Implicit"
  • "Explicit"
  • "Crank-Nicholson"
  • "Runge-Kutta"
  • "Mixed"
  • Other user-specified value if none of the above apply.

Repeatable: False

temporal-accuracy-order

Definition: The exponent of the time step parameter () determining the rate at which the discretization error approaches zero as the time step tends to zero.

Values: An integer containing the order of accuracy. Standard units: dimensionless.

Repeatable: False

grid-spacing

Definition: Defines the physical distance between adjacent discretization points in a numerical domain.

Values: Contains SubSubProperties as defined in the table below.

Repeatable: True

time-step-size

Definition: The discrete duration of progression between consecutive stages of a simulation, determining the interval for numerical integration, resolution of transient phenomena, and stability of the governing equations.

Values: Contains SubSubProperties as defined in the table below.

Repeatable: True

solver

Definition: Specific numerical algorithm or backend used for solving the equations.

Values: A string containing the solver name (e.g., "Newton-Krylov", "GMRES").

Repeatable: True

variables

Definition: Explicit data structure capturing changing and unchanging elements of the physics model.

Values: Contains SubProperties as listed below.

Repeatable: False

model-parameter

Definition: A specific, adjustable, or calibrated numeric value that controls certain aspects of the simulation.

Values: Contains SubSubProperties as defined in the table below.

Repeatable: True

field-variable

Definition: A physical quantity that can evolve during the simulation.

Values: Contains SubSubProperties as defined in the table below.

Repeatable: True

Table PF-2: Governing Equation SubSubProperties

SubPropertySubSubpropertyDescription
governing-equation*type*

Definition: The mathematical classification of the phase field equation.

Values: String containing one of the following values:

  • "CH" – Cahn-Hilliard
  • "AC" – Allen-Cahn
  • "GP" – Grand Potential
  • "FID" – Finite Interface Dissipation
  • "PFC" – Phase Field Crystal
  • "Fluid"
  • "Multi"
  • "Coupled"
  • "Diffuse"
  • Other user-specified value if none of the above apply.

Repeatable: False

evolved-variable*

Definition: The name of the field-variable that this equation solves for.

Values: A string matching a defined field-variable name (e.g., "phi", "c").

Repeatable: False

driving-energy

Definition: A free energy component that acts as a forcing term in this equation.

Values: A string matching one of the defined free-energy names (e.g. “f_chem", "f_elastic”).

Repeatable: True

Table PF-3: Free energy SubSubProperties

SubPropertySubSubpropertyDescription
free-energy*name*

Definition: The identifier or symbol used for this specific energy functional.

Values: A string (e.g., "f_chem", "Chemical Free Energy").

Repeatable: False

description*

Definition: The type/derivation of the thermodynamic bulk free energy function.

Values: String containing one of the following values:

  • "Ideal"
  • "Regular"
  • "Flory"
  • "Calphad"
  • "Poly"
  • "Non-polynomial"
  • Other user-specified value if none of the above apply.

Repeatable: False

expression

Definition: The exact mathematical formulation of the phase field functional.

Values: A string containing a code snippet, LaTeX formula, or SymPy expression.

Repeatable: False

unit

Definition: The physical unit of the free energy density.

Values: A string in the GNU unit convention (or the text “dimensionless”).

Repeatable: False

Table PF-4: Grid spacing SubSubProperties

SubPropertySubSubpropertyDescription
grid-spacingname*

Definition: The designation of the grid spacing parameter.

Values: A string containing the name of the spacing metric, (e.g., "dx", "dy", "dz", "min", "max", "average").

Repeatable: False

value*

Definition: The data associated with the grid spacing parameter.

Values: A real number containing the grid spacing value.

Repeatable: False

unit*

Definition: A standardized string or identifier that defines the dimension or measurement system of the associated value.

Values: A string containing the units of the value in the GNU unit convention (or the text “dimensionless”).

Repeatable: False

description

Definition: An explanation providing further clarification on the grid spacing parameter.

Values: A string explaining the grid spacing paramter.

Repeatable: False

Table PF-5: Time step size SubSubProperties

SubPropertySubSubpropertyDescription
time-step-sizename*

Definition: The designation of the time step size parameter.

Values: A string containing the name of the time step metric, (e.g., "dt", "initial", "min", "max").

Repeatable: False

value*

Definition: The data associated with the time step size parameter.

Values: A real number containing the grid spacing value.

Repeatable: False

unit*

Definition: A standardized string or identifier that defines the dimension or measurement system of the associated value.

Values: A string containing the units of the value in the GNU unit convention (or the text “dimensionless”).

Repeatable: False

description

Definition: An explanation providing further clarification on the time step size parameter.

Values: A string explaining the time step size parameter.

Repeatable: False

Table PF-6: Model parameter SubSubProperties

SubPropertySubSubpropertyDescription
model-parametername*

Definition: The designation of the model parameter.

Values: A string containing the name of the model parameter (e.g. “kappa”, “W”, “mobility”).

Repeatable: False

value*

Definition: The data associated with the model parameter.

Values: An integer, real, boolean, or string containing the parameter value.

Repeatable: False

unit*

Definition: A standardized string or identifier that defines the dimension or measurement system of the associated value.

Values: A string containing the units of the value in the GNU unit convention (or the text “dimensionless”).

Repeatable: False

description

Definition: An explanation providing further clarification on the model parameter.

Values: A string explaining the model parameter (e.g. “Gradient energy coefficient”).

Repeatable: False

Table PF-7: Field variable SubSubProperties

SubPropertySubSubpropertyDescription
field-variablename*

Definition: The designation of the field variable.

Values: A string containing the name of the field variable (e.g. “phi”, “c”, “T”).

Repeatable: False

type*

Definition: The kind of data associated with the field variable.

Values: A string containing:

  • “Scalar”
  • “Vector”
  • “Tensor”
  • Other user-specified value if none of the above apply.

Repeatable: False

unit*

Definition: A standardized string or identifier that defines the dimension or measurement system of the associated value.

Values: A string containing the units of the value in the GNU unit convention (or the text “dimensionless”).

Repeatable: False

description

Definition: An explanation providing further clarification on the field variable.

Values: A string explaining what the field variable represents (e.g. “phase field order parameter”, “solute concentration”).

Repeatable: False

Example: PF Method-Specific Metadata (https://zenodo.org/records/7319845 )

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <physical-phenomena>Homogeneous nucleation (single seed)</physical-phenomena>
  <problem-specification>
    <time-dependence>Transient evolution</time-dependence>
    <governing-equation>
      <type>AC</type>
      <evolved-variable>phi</evolved-variable>
      <driving-energy>f_total</driving-energy>
    </governing-equation>
    <free-energy>
      <name>f_total</name>
      <description>Non-dimensionalized Ginzburg-Landau</description>
      <expression>0.5 * (nabla phi)^2 + g(phi) - delta_f * p(phi)</expression>
      <unit>dimensionless</unit>
    </free-energy>
  </problem-specification>
  <domain-and-mesh>
    <dimensionality>2D</dimensionality>
    <coordinate-system>Cartesian</coordinate-system>
    <mesh-type>Regular</mesh-type>
    <domain-size>100.0</domain-size>
    <boundary-conditions>Periodic</boundary-conditions>
    <initial-conditions>phi(r) = 0.5 * [1 - tanh((r - r_0) / sqrt(2))] where r = sqrt(x^2 + y^2)</initial-conditions>
  </domain-and-mesh>
  <numerical-method>
    <spatial-discretization>FVM</spatial-discretization>
    <spatial-accuracy-order>2</spatial-accuracy-order>
    <temporal-discretization>Implicit</temporal-discretization>
    <temporal-accuracy-order>1</temporal-accuracy-order>
    <grid-spacing>
      <name>dx</name>
      <value>1.0</value>
      <unit>dimensionless</unit>
    </grid-spacing>
    <grid-spacing>
      <name>dy</name>
      <value>1.0</value>
      <unit>dimensionless</unit>
    </grid-spacing>
    <time-step-size>
      <name>dt</name>
      <value>0.01</value>
      <unit>dimensionless</unit>
    </time-step-size>
    <solver>FiPy Default Linear Solver</solver>
  </numerical-method>
  <variables>
    <model-parameter>
      <name>delta_f</name>
      <value>0.04714045</value>
      <unit>dimensionless</unit>
      <description>Non-dimensional nucleation driving force (sqrt(2)/30)</description>
    </model-parameter>
    <model-parameter>
      <name>r_0</name>
      <value>5.05</value>
      <unit>dimensionless</unit>
      <description>Initial diffuse seed radius (e.g., 1.01 * r* for supercritical run)</description>
    </model-parameter>
    <field-variable>
      <name>phi</name>
      <type>Scalar</type>
      <unit>dimensionless</unit>
      <description>Non-conserved phase field variable (0=liquid, 1=solid)</description>
    </field-variable>
  </variables>
</root>

Derived Property MatCore Metadata

The metadata for derived properties (MatCore-Der) is tabulated below. This refers to properties computed from data generated using a CMS method documented elsewhere in this section (e.g. spectroscopy computations based on DFT data). (Notes: (1) This standard inherits the metadata in the minimal MatCore standard and one of the associated methods metadata, adding the properties below. (2) Derived properties in this category are limited to properties based on a single method. Derived properties involving a combination of multiple methods (e.g. MD and ML) are not supported. An exception is MBPT, which itself is based on DFT.)

Examples of derived properties include:

  • Electronic (band structure, density of states (DOS), projected DOS, dielectric function, charge and bonding analysis, etc.)
  • Electron-phonon interactions (superconducting properties, line widths, etc.)
  • Linear-response (phonon modes, dispersion, interatomic force constants, vibrational DOS, etc.)
  • Magnetic (magnetics, magnetization, magnons, etc.)
  • Microscopy, electron (S/TEM, 4D-STEM, SAED, etc.)
  • Microscopy, scanning probe (STM, AFM, etc.)
  • NMR (chemical shifts)
  • Optical (electronic excitation spectra, low-loss EELS, etc.)
  • Spectroscopy, core-level (XAS, XES, core-loss EELS, XPS, etc.)
  • Spectroscopy, vibrational (Raman, Infrared spectroscopy, etc.)
  • Transport (electronic conductivity, thermal conductivity, etc.)
  • X-ray scattering (diffraction, reflectivity, pair-distribution function, etc.)

Table DER-1: MatCore-DER metadata

PropertySubpropertyDescription
derived-property*

Definition: Details of a quantity related to material behavior computed based on a computational materials science (CMS) calculation covered by the MatCore standard.

Values: Contains SubProperties as listed below.

Repeatable: True

type*

Definition: Kind of physical quantity being computed.

Values: String one of the following values that defines the type of the computed derived property:

  • “Electronic“
  • “Electron-phonon interactions”
  • “Linear-response”
  • “Magnetic”
  • “Microscopy, electron”
  • “Microscopy, scanning probe”
  • “NMR”
  • “Optical”
  • “Spectroscopy, core-level”
  • “Spectroscopy, vibrational”
  • “Transport”
  • “X-ray scattering”
  • Other user-specified value if none of the above apply.

Repeatable: False

description*

Definition: Explanation of the derived property.

Values: A string containing an explanation of the specific derived property that was computed.

Repeatable: False

calculation-method*

Definition: Details regarding an approach used to obtain the derived property. Multiple approaches may be involved in the computation.

Values: Contains SubSubProperties as defined in the table below.

Repeatable: True

Table DER-2: Calculation method SubSubProperties

SubPropertySubSubpropertyDescription
calculation-method*description*

Definition: Description of the method used to obtain the derived property.

Values: A string containing a human-readable description of the approach and/or algorithm used to compute the derived property.

Repeatable: False

calculation-parameter

Definition: A parameter associated with the specified calculation method.

Values: Contains SubSubSubProperties as defined in the table below.

Repeatable: True

DER-3: Calculation parameter SubSubSubProperties

SubSubPropertySubSubSubpropertyDescription
calculation-parametername*

Definition: The designation of the parameter associated with the specified derived property calculation method.

Values: A string containing the name of the calculation parameter.

Repeatable: False

value*

Definition: The data associated with the calculation parameter.

Values: An integer, real, boolean, or string containing the calculation parameter value.

Repeatable: False

unit*

Definition: A standardized string or identifier that defines the dimension or measurement system of the associated value.

Values: A string containing the units of the value in the GNU unit convention (or the text “dimensionless”).

Repeatable: False

description

Definition: An explanation of the meaning, source, or purpose of the calculation parameter.

Values: A string explaining the computation parameter (e.g. “The spin-spin coupling constant for an NMR spectroscopy calculation”).

Repeatable: False

Example: Derived Property Metadata (Source: https://archive.materialscloud.org/record/2024.116)

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <derived-property>
    <type>Spectroscopy, vibrational</type>
    <description>Raman spectra plots of intensity versus wavenumber (frequency) representing inelastic light scattering (Raman effect) from a sample.</description>
    <calculation-method>
      <description>The Raman spectra is obtained in two steps: (1) Phonon eigenmodes and frequencies are computed for a given crystalline material within the harmonic approximation using force constants derived from density functional theory (DFT). (2) The Raman tensor is obtained by numerically differentiating the electronic susceptibility with respect to atomic displacements along each vibrational mode and using these tensors to compute Raman scattering intensities for all Raman-active modes, ultimately yielding a full Raman spectrum after appropriate averaging and broadening.</description>
    </calculation-method>
  </derived-property>
</root>