The minimal MatCore metadata includes aspects of a dataset deemed important to report for data findability and reuse and are universal across all computational methods. The list of minimal metadata is tabulated below.
| Property | Subproperty | Description |
| creator* | | Definition: The author who generated the data and institutional affiliation.
Values: Contains SubProperties as listed below.
Repeatable: True |
| name* | Definition: The name of the author who generated the data.
Values: A string containing the author name.
Repeatable: False |
| affiliation* | Definition: The affiliation of the author who generated the data.
Values: A string containing the author affiliation.
Repeatable: True |
| title* | | Definition: A single sentence description of the dataset.
Values: A string containing the dataset title.
Repeatable: False |
| creation-date* | | Definition: The calendar date when the dataset was created.
Values: A string containing the creation date in YYYY-MM-DD format (following the ISO 8601 Date and Time Format Standard)
Repeatable: False |
| description* | | Definition: A synopsis of the dataset, its contents, and purpose.
Values: A string containing a description of the dataset contents. This can optionally include information on the format in which the dataset is stored (e.g. pointer to a relevant schema).
Repeatable: False |
| disclaimer | | Definition: A statement of applicability provided by the creator(s) informing users of the intended use and/or limitations of this dataset.
Values: A string containing the dataset disclaimer.
Repeatable: False |
| material* | | Definition: A description of the chemical composition and structure of a substance in this dataset.
Values: Contains SubProperties as listed below.
Repeatable: True |
| phase* | Definition: The structure of the material included in the dataset.
Values: A list of strings containing one or more of the following values:
- “Amorphous” – A solid material without crystalline order
- “Crystal” – A solid material characterized by a repeating pattern of atoms.
- “Quasicrystal” – A crystal-like material with short-range order, but without long-range order.
- “Molecule” – A finite group of atoms bonded together forming an independent unit.
- “Liquid” – A fluid structure (e.g. the molten state of a material)
- “Gas” – A structure where atoms are far apart and only interact during collisions.
- “Plasma” – A gas comprised of ions and electrons.
- Other user-specified value if none of the above apply.
Repeatable: False |
| description | Definition: An explanation of the nature of the material, e.g. a crystal structure designation, a chemical formula for a molecule, etc.
Values: A string containing the material description.
Repeatable: False |
| constituent* | Definition: A chemical element included in the material and its concentration.
Values: Contains SubSubProperties as defined in the table below.
Repeatable: True |
| microstructure | Definition: The arrangement of phases, grains and defects in a material.
Values: A string containing a description of the microstructure of the material.
Repeatable: False |
| computation* | | Definition: A description of a computer simulation used to generate data in the dataset.
Values: Contains SubProperties as listed below.
Repeatable: True |
| method-class* | Definition: The general category to which the computational technique belongs.
Values: A string containing one of the following values:
- “Electronic”
- “Atomistic”
- “Mesoscopic”
- “Continuum”
- “Data-driven”
- Other user-specified value if none of the above apply.
Repeatable: False
Note: The categories listed above are based on the MODA standard. |
| method* | Definition: The computational materials science (CMS) approach used in the computation.
Values: A string containing one of the following values:For method-class “Electronic”:
- “CC” – Coupled Cluster
- “QMC” – Quantum Monte Carlo
- “DFT” – Density Functional Theory
- “MBPT” – Many-Body Perturbation Theory methods (such as GW and BSE)
For method-class “Atomistic”:
- “MC” – Monte Carlo
- “MD” – Molecular Dynamics
For method-class “Mesoscopic”:
- “DDD” – Discrete Dislocation Dynamics
- “KMC” – Kinetic Monte Carlo
- “CGMD” – Coarse-grained MD
For method-class “Continuum”:
For method-class “Data-driven”:
- “ML” – Machine Learning
- Other user-specified value if none of the above apply.
Repeatable: False |
| simulation-conditions* | Definition: The interactions between the system being modeled and the rest of the world maintained during the computation.
Values: Contains SubSubProperties as defined in the table below.
Repeatable: False |
| software* | Definition: Identification of a computer package used to perform the calculations.
Values: Contains SubSubProperties as defined in the table below.
Repeatable: True |
| citation | | Definition: Information that uniquely identifies a source being acknowledged.
Values: Contains SubProperties as listed below.
Repeatable: True |
| reference* | Definition: Text that uniquely identifies the source of information being cited.
Values: A string containing the reference to the source.
Repeatable: False |
| doi | Definition: The digital object identifier (DOI) for the source.
Values: A string containing the source DOI.
Repeatable: False |
| link | Definition: A URI pointing to a permanent location of the source.
Values: A string containing the file URI.
Repeatable: False |
| funding | | Definition: Information about received monetary support or other resources to generate the dataset.
Values: Contains SubProperties as listed below.
Repeatable: True |
| award-title* | Definition: Name of the grant that provided funding to generate the dataset.
Values: A string containing the award title.
Repeatable: False |
| funder* | Definition: The name of the funding agency that provided money and/or resources to generate the dataset.
Values: A string containing the funding agency using Crossref Open Funder Registry designation.
Repeatable: False |
| award-number | Definition: A funder identifier for the grant.
Values: A string containing the award number.
Repeatable: False |
| related-content | | Definition: Other datasets that are connected to the current one in some manner.
Values: Contains SubProperties as listed below.
Repeatable: True |
| links* | Definition: Pointer to one or more related datasets.
Values: A list of permanent pointers to related datasets (such as MatCore IDs, DOIs, URIs, etc.)
Repeatable: False |
| description | Definition: Explanation of the relationship between the related content and the current dataset.
Values: A string containing a description of the related content.
Repeatable: False |
| provenance | | Definition: History of the dataset, detailing its origins and transformations.
Values: Contains SubProperties as listed below.
Repeatable: True |
| event-type* | Definition: Description of change made to the dataset.
Values: A string containing one of the following values:
- “Initial creation” – Initial creation of the dataset
- “Admin update” – Editorial update to the dataset not requiring a version update (e.g. spelling correction).
- “Version update” – Change to the dataset or associated files requiring a new version
- “Metadata update” – Change to the metadata, but not the dataset itself.
- Other user-specified value if none of the above apply.
Repeatable: False |
| date* | Definition: The data when the change was made.
Values: A string containing this document creation date in YYYY-MM-DD format (following the ISO 8601 Data and Time Format Standard)
Repeatable: False |
| agent* | Definition: Identity of the entity responsible for the change.
Values: A string containing the name or other designation identifying who or what is responsible for the change.
Repeatable: False |
| comments | Definition: Explanation for the change in provenance.
Values: A string containing an explanation for the nature of the changes leading to the revised provenance.
Repeatable: False |
| checksum | Definition: A digital fingerprint for the dataset of associated files.
Values: A list containing two strings: The name of a file, and the checksum for the file (e.g. the SHA-256 encoding for file).
Repeatable: False |
| matcore-id* | | Definition: An identifier for the dataset.
Values: A string containing the MatCore identifier.
Repeatable: False |
| matcore-date* | | Definition: The calendar date when this MatCore document was created.
Values: A string containing this document creation date in YYYY-MM-DD format (following the ISO 8601 Data and Time Format Standard)
Repeatable: False |
| license* | | Definition: A contract defining the terms and conditions under which the dataset can be used.
Values: A string containing the dataset license in a format conforming to the SPDX Standard.
Repeatable: False |
| SubProperty | SubSubproperty | Description |
| simulation-conditions* | type* | Definition: The nature of the simulation being performed, whether equilibrium, nonequilibrium, or nonstandard.
Values: String containing one of the following values defining the simulation conditions:
- “Equilibrium” – Thermodynamic equilibrium
- “Nonequilibrium” – Conforming to some definition of nonequilibrium thermodynamics
- “Nonstandard” – A simulation not fitting into the equilibrium and nonequilibrium categories.
Note 1: The SubSubProperties listed below (aside from description) are provided as appropriate given the nature of the simulation. Providing a SubSubProperty means that its value is imposed on the simulation as an external constraint.Note 2: Units for dimensioned variables are to be given in the SI unit system.
Repeatable: False |
| description | Definition: Explanatory text clarifying the simulation type selection.
Values: A string containing the simulation type explanation.
Repeatable: False |
| number-of-particles | Definition: A count of atoms or other discrete entities comprising the material.
Values: An integer containing the number of particles N. Standard units: dimensionless.
Repeatable: False |
| volume | Definition: The amount of space occupied by the material.
Values: A real number containing the volume V. Standard units: cubic meters (m3).
Repeatable: False |
| mass-density | Definition: The total mass of the material divided by the volume it occupies.
Values: A real number containing the mass density 𝜌. Standard units: kilograms per cubic meter (kg/m3).
Repeatable: False |
| number-density | Definition: The total number of particles comprising the material divided by the volume it occupies.
Values: A real number containing the number density n. Standard units: inverse cubic meter (1/m3).
Repeatable: False |
| cell | Definition: The domain occupied by the material in the simulation (if fixed).
Values: Ordered list of three vectors (a, b, c), each with three components, that define the simulation cell. The vectors (a, b, c) need not be orthogonal, and the magnitudes (|a|, |b|, |c|) are the supercell vector lengths. Standard units: meter (m).
Repeatable: False |
| cell-reference | Definition: The domain occupied by the material in the simulation in configuration relative to which strains are defined.
Values: Ordered list of three vectors (a0, b0, c0), each with three components, that define the cell used in the simulation in its reference configuration. The vectors (a0, b0, c0) need not be orthogonal, and the magnitudes (|a0|, |b0|, |c|0) are the reference cell vector lengths. Standard units: meter (m).
Repeatable: False |
| cell-periodicity | Definition: Specification of whether periodic boundary conditions are applied along the cell vector directions. This applies to both cell and cell-reference.
Values: Ordered list of three booleans (pa,, pb,, pc,), which are set to true if periodic boundary conditions are applied along the corresponding cell vectors (a, b, c). Standard units: dimensionless
Repeatable: False |
| temperature | Definition: A measure of the hotness or coldness of an external heat bath in thermal contact with the material.
Values: A real number containing the temperature T. Standard units: kelvin (K).
Repeatable: False |
| stress | Definition: A uniform force per unit current area, including both normal and shear components, applied to the material during the simulation through contact with an external loading device.
Values: An array of six real numbers containing the components of the (symmetric) Cauchy stress tensor σ in the following order: [σ11, σ22, σ33, σ23, σ13, σ12]. The stress component are expressed relative to an orthonormal basis (e1, e2, e3), where e1 is in the direction of a, e2 is in the direction of c╳a, and e3= e1╳e2, where (a, b, c) are the vectors defining the cell, which may not be orthogonal. Standard units: pascal (Pa).
Note 1: Requires the definition of the cell subproperty in order for a basis to be defined relative to which the tensor components are given.
Note 2: The adopted sign convention is for tensile normal stresses to be positive, and compressive normal stresses to be negative. This implies that for a material subjected to a positive pressure p, the stress tensor components are [-p, -p, -p, 0, 0, 0].
Repeatable: False |
| strain | Definition: A deformation relative to a reference configuration imposed on the simulation cell expressed in the referential frame (Lagrangian strain tensor).
Values: An array of six real numbers containing the components of the (symmetric) Lagrangian strain tensor E in the following order: [E11, E22, E33, E23, E13, E12]. The strain component are expressed relative to an orthonormal basis (e1, e2, e3), where e1 is in the direction of a0, e2 is in the direction of c0╳a0, and e3= e1╳e2, where (a0, b0, c0) are the vectors defining the cell in the reference configuration, which may not be orthogonal. Standard units: dimensionless.
Note: Requires the definition of cell-reference in order for the basis to be defined relative to which the tensor components are given.
Repeatable: False |
| strain-rate | Definition: The derivative with respect to time of a deformation relative to a reference configuration imposed on the simulation cell expressed in the referential frame (Lagrangian strain rate tensor), which is constant throughout the simulation.
Values: An array of six real numbers containing the components of the (symmetric) Lagrangian strain rate tensor in the following order: [11, 22, 33, 23, 13, 12,]. The strain component are expressed relative to an orthonormal basis (e1, e2, e3), where e1 is in the direction of a0, e2 is in the direction of c0╳a0, and e3= e1╳e2, where (a0, b0, c0) are the vectors defining the cell in the reference configuration, which may not be orthogonal. Standard units: dimensionless.
Requires the definition of cell-reference in order for the basis to be defined relative to which the tensor components are given.
Repeatable: False |
| heat-flux | Definition: The amount of thermal energy transferred to the material per unit area per unit time along a given direction.
Values: An array of three real numbers containing the components of the heat flux vector q. The heat flux component are expressed relative to an orthonormal basis (e1, e2, e3), where e1 is in the direction of a, e2 is in the direction of c╳a, and e3= e1╳e2, where (a, b, c) are the vectors defining the cell, which may not be orthogonal. Standard units: watts per square meter (W/m2).
Repeatable: False |
| temperature-gradient | Definition: A physical quantity that describes the rate and direction of maximum temperature change per unit distance.
Values: An array of three real numbers containing the components of the temperature gradient vector ∇T. The temperature gradient component are expressed relative to an orthonormal basis (e1, e2, e3), where e1 is in the direction of a, e2 is in the direction of c╳a, and e3= e1╳e2, where (a, b, c) are the vectors defining cell, which may not be orthogonal. Standard units: watts per square meter (W/m2).
Repeatable: False |
Note: The matcore-id format in the example is tentative and subject to change.