Health-RI Metadata Model

Living Document,

This version:
https://health-ri.github.io/metadata-documentation/
Latest published version:
https://w3id.org/health-ri/metadata/releases/latest
Previous Versions:
Feedback:
Report a bug
Open new issues
Status reported issues
Editors:
Bruna dos Santos Vieira
Kees Burger
Joost Daams
Ana Konrad
Hannah Neikes
⁠Reinier Groeneveld
Niek van Ulzen
Former Editors:
Inês de Oliveira Coelho Henriques
César Bernabé
Patrick Dekker
Milou de Jong
Dena Tahvildari
Xiaofeng Liao
Mark Janse
Junda Huang
Luiz Bonino
License:
CC-BY-4.0

Abstract

Health-RI Logo
The National Health Data Catalogue uses a core metadata schema: a set of minimal elements for describing each resource (e.g. dataset) with common metadata. It defines the requirements to access and reuse information across Health-RI nodes via the National Catalogue. This document contains the specification details on the Health-RI core metadata schema.

1. Permanent Urls

1.1. Project Homepage

1.2. Health-RI Metadata

1.3. Metadata Resources

To support long-term FAIR (Findable, Accessible, Interoperable, and Reusable) access to Health-RI Metadata Schema resources, we provide persistent URLs using the W3ID system. These permanent identifiers improve the findability and accessibility of the related artifacts. The following W3ID redirects are available for Health-RI metadata:

1.4. Versioning & Releases

2. Previous Published

2.1. Previous versions

The previous published version (version 1.0.2) of the Health-RI core metadata schema is available here.

3. Audience Documentation

3.1. Purpose and audience

This branch contains the 2nd version of the Health-RI core and generic health metadata schema for the National Health Data Catalogue, detailing the classes and entities involved and offering usage notes for developers. It addresses the schema’s design and application but excludes discussion on the National Health Data Catalogue and its onboarding process (these are described here). Please also visit Confluence for general information about the metadata schema and metadata mapping. Please note that we are currently still working on the implementation of the new schema into the frontend of the National Health Data Catalogue.

This documentation aims at a technical audience tasked with implementing the metadata schema and stakeholders interested in a detailed understanding of the core metadata schema. With any further questions or comments please contact Health-RI via the Health-RI ServiceDesk or via servicedesk@health-ri.nl.

4. Introduction

4.1. Scope and Current State of the Schema

Building on the 1st version of the metadata schema, version 2 aims to incorporate both DCAT-AP NL and the (yet-to-be-finalized) HealthDCAT-AP, along with Health-RI-specific requirements for the National Health Data Catalogue. It introduces several health-related properties (marked in blue in the UML diagram below), with suggested or required controlled vocabularies where applicable.

Important Note: HealthDCAT-AP has not yet been officially finalized and remains subject to change. Once its official release is published, Health-RI will reevaluate compatibility with HealthDCAT-AP. This version is based on the draft dated 16-12-2024. In that draft, the cardinalities of HealthDCAT-AP vary depending on different access rights (public, restricted, non-public). For now, compliance is ensured with the open version, using its UML diagram for reference.

Additionally, several ELSI-related metadata fields, as gathered by the Health-RI ELSI team, have been included in this version 2. These fields are not mandatory but will be evaluated upon implementation in the catalogue.

To indicate the nature of the data (e.g., whole genome sequencing or questionnaire data), we propose using healthdcatap:healthTheme. For synthetic data, use dct:type with the required controlled vocabulary in the dcat:Dataset class.

Several classes from DCAT-AP NL and draft HealthDCAT-AP have been included but not further specified for Health-RI yet. This includes the DataService class, meaning that these classes can be used but are not yet tailored to specific dataholder needs for the National Health Data Catalogue.

4.2. Used Prefixes

Prefix Namespace IRI
adms http://www.w3.org/ns/adms#
dcat http://www.w3.org/ns/dcat#
dcatap http://data.europa.eu/r5r/
dct http://purl.org/dc/terms/
dpv https://w3id.org/dpv#
dqv https://www.w3.org/TR/vocab-dqv/
eli http://data.europa.eu/eli/ontology#
foaf http://xmlns.com/foaf/0.1/
owl http://www.w3.org/2002/07/owl#
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs http://www.w3.org/2000/01/rdf-schema#
skos http://www.w3.org/2004/02/skos/core#
spdx http://spdx.org/rdf/terms#
time http://www.w3.org/2006/time#
vcard http://www.w3.org/2006/vcard/ns#
xsd http://www.w3.org/2001/XMLSchema#
healthdcatap To Be Determined

5. Overview and Diagram

An overview of the metadata schema core is presented in the UML diagram below. This UML diagram showcases primary classes (entities), excluding detailed definitions such as rdfs:label and rdfs:comment. Each block represents a class and lists its attributes (properties). Where properties reference another class, their range is displayed in pink font.

If a class is linked to another class with a closed arrow, it inherits all properties from the other class (e.g., dcat:Dataset inherits from dcat:Resource). Other arrows represent relationships, including their types (e.g., dcat:Dataset connects to a dcat:DatasetSeries via dcat:inSeries), along with cardinalities (e.g., dcat:Dataset connects to zero or more dcat:DatasetSeries). Mandatory relationships are marked with dark labels, while recommended relationships use a lighter color.

The UML diagram separates main classes from supporting classes. While relationships between main classes are indicated by arrows, supporting class relationships are not visually connected via arrows to maintain clarity in the diagram. Instead, they can be deduced from the pink-colored property ranges listed per class.

Properties derived from draft HealthDCAT-AP (mostly within the dcat:Dataset class) are marked blue.

A tabular overview of all classes and properties—including their range, cardinality, controlled vocabulary (if applicable), and usage notes—is provided below. A reference sheet containing this information can be found here. This sheet also documents property histories (compared to v1 of the Health-RI core metadata schema) and specifies the origins of new constraints (whether they stem from DCAT-AP v3, DCAT-AP NL, or HealthDCAT-AP).

5.1. UML Class Diagram v2.0.2

Health-RI Metadata Diagram

Usage directions for the properties and their associated constraints that apply in the context of this profile, and the range of properties, are listed below.

5.2. Class Structure

The Health-RI metadata schema is builds on DCAT-AP 3.0, which defines a set of classes and properties for describing datasets, services, and related resources. To make the model easier to apply (and interoperable) across catalogues, the schema organizes its structure around two types of classes: Main Classes and Supportive Classes.

Main classes in this structure:

Supportive classes in this structure:

The main classes and supportive classes together form the Health-RI Core metadata schema.

Please take into consideration:

5.3. Usage Notes on Schema / Mapping

Supportive classes are included because they form the range of properties used by the main classes [additional requirements to properties for the entity]. They enrich the main classes which are the core entities in the catalogue. Both structures are further divided into mandatory properties for conformance and recommended properties for richer metadata.

The separation from above helps modularize metadata and makes it easier to reuse supporting elements across multiple datasets or services. It helps users to easier use those classes. To apply it:

The following sections describe each class in detail, including its role in the schema, its mandatory and recommended properties, and examples of how to populate them.

6. Main Classes

Main Classes represent the core entities of the data catalog. They define leading components.

6.1. Mandatory

6.1.1. Catalog

Class name HealthDCAT-AP Definition Usage Note URI
Catalog A catalogue or repository that hosts the Datasets or Data Services being described. A catalog that is listed in the National Health Data catalog and contains one or several datasets and/or data services. Used to describe a bundle of datasets (and other resources) under a single title, for example, a collection. dcat:Catalog
Property label Definition Property URI Range Cardinality Usage note
applicable legislation The legislation that is applicable to this resource. dcatap:applicableLegislation eli:LegalResource (IRI) 0..n NaN
catalog A catalog that is listed in the catalog. dcat:catalog dcat:Catalog (IRI) 0..n For certain research projects, multiple catalogs may need to be organized in a nested manner. This property serves to connect the different catalogs with each other.
contact point Relevant contact information for the cataloged resource. dcat:contactPoint vcard:Kind 1 This property points to a contact point (department or person) that can answer questions about the catalogue. Details on how to describe these are provided under class vcard:Kind.
Whenever possible, use a general contact information (for example from a department) instead of contact information of an individual.
creator An entity responsible for making the resource. dct:creator foaf:Agent 0..n Note that the Health-RI model diverges from DCAT-AP NL here, which reduces the maximum number of creators to 1. The Health-RI model allows specification of multiple creators of a catalogue.
dataset A dataset that is listed in the catalog. dcat:dataset dcat:Dataset 0..n Each catalog contains one or more datasets. This property serves to link datasets to a catalogue. Therefore, a dataset is always contained inside a catalogue.
description An account of the resource. dct:description rdfs:Literal 1..n Briefly describe the catalog and what it contains. You can repeat this in multiple languages.
geographical coverage Spatial characteristics of the resource. dct:spatial dct:Location (IRI) 0..n The EU Vocabularies Name Authority Lists must be used for continents, countries and places that are in those lists; if a particular location is not in one of the mentioned Named Authority Lists, Geonames URIs must be used. For districts or neighbourhoods in NL, the Dutch vocab can be used.
has part A related resource that is included either physically or logically in the described resource. dct:hasPart dcat:Catalog (IRI) 0..n Use this property to establish another catalogue in this catalogue.
Note that deeply nested structures should be avoided, and can currently not be displayed in the National Health Data Catalogue.
home page A homepage for some thing. foaf:homepage foaf:Document (IRI) 0..1 The home page of the catalogue, if available.
language A language of the resource. dct:language dct:LinguisticSystem (IRI) 0..n The value of this property must be an IRI from the provided controlled vocabulary.
For example: http://publications.europa.eu/resource/authority/language/NLD
licence A legal document giving official permission to do something with the resource. dct:license dct:LicenseDocument (IRI) 0..1 The licence under which the catalogue (with resource description) is made available. In the description of distributions and data services the licences of that resources are taken up.
modification date Date on which the resource was changed. dct:modified xsd:dateTime 0..1 NaN
publisher An entity responsible for making the resource available. dct:publisher foaf:Agent 1 The organisation or individual that is holder of the intellectual property rights of a dataset. For more details about the publisher, see the class Agent. Example: Radboudumc.
release date Date of formal issuance of the resource. dct:issued xsd:dateTime 0..1 NaN
rights Information about rights held in and over the resource. dct:rights dct:RightsStatement (IRI) 0..1 The rights statement should be a free-text statement placed at a web-accessible location such that the value of this property is the IRI pointing to that statement.
service A service that is listed in the catalog. dcat:service dcat:DataService (IRI) 0..n Some datasets may have real-time Data Services (e.g., Beacon API counting individuals). IT teams should define the relationship between the catalog and the Data Service via this property. While crucial for interoperability, this property is not relevant for end-users to collect.
temporal coverage Temporal characteristics of the resource. dct:temporal dct:PeriodOfTime 0..n NaN
themes A main category of the resource. A resource can have multiple themes. dcat:themeTaxonomy skos:ConceptScheme (IRI) 0..n This property refers to a knowledge organisation system used to classify the Catalogue’s Datasets. It must have at least the value NAL:data-theme as this is the mandatory controlled vocabulary for dcat:theme.
title A name given to the resource. dct:title rdfs:Literal 1..n Provide a unique title for your catalog, which can be repeated in multiple languages. Example: Healthy Brain Study.

6.1.2. Dataset

Class name HealthDCAT-AP Definition Usage Note URI
Dataset A conceptual entity that represents the information published. When focusing on health data, a dataset typically contains structured information gathered from a study or research project related to health topics. This might include clinical trial results, public health statistics, patient records, survey data, etc.
How the data in a dataset can be accessed is defined in the Distribution, which usually points to the actual data files available for access or download. Datasets are often included in a catalog, which organizes and provides metadata about multiple datasets, making them easier to find and use. The term 'agent' refers to any entity responsible for creating, maintaining, or distributing the dataset.
dcat:Dataset
Property label Definition Property URI Range Cardinality Usage note
access rights Information about who access the resource or an indication of its security status. dct:accessRights dct:RightsStatement (IRI) 1 Information that indicates whether the Dataset is publicly accessible, has access restrictions or is not public. It is foreseen that one of the three options have to be used: public, restricted, non-public.
    Open Data (Public): The dataset is available under general open data rules, such as those covered by the High Value Datasets Implementing Regulation.
    Protected Data (Restricted): The dataset contains protected data and is accessible only under specific conditions, as outlined in regulations like the Data Governance Act.
    Sensitive Data (Non public): The dataset includes resources that may contain sensitive or personal information, falling under regulations such as the EHDS Regulation.

Since most data contain personal information, these datasets will need to take the value 'non-public' for the access rights property.
analytics An analytics distribution of the dataset. healthdcatap:analytics dcat:Distribution (IRI) 0..n Publishers are encouraged to provide URLs pointing to document repositories where users can access or request associated resources such as technical reports of the dataset, quality measurements, usability indicators,... Note that HealthDCAT-AP mentions also API endpoints or analytics services, but these would not be Distriutions but rather DatasetServices.
applicable legislation The legislation that is applicable to this resource. dcatap:applicableLegislation eli:LegalResource (IRI) 1..n The ELI of the EHDS was published in March 2025 and can now be included as the applicable legislation mandating that the dataset has to be made public.
For health datasets, the value must include the ELI of the EHDS Regulation (http://data.europa.eu/eli/reg/2025/327/oj).
As multiple legislations may apply to the resource the maximum cardinality is not limited.

While the applicable legislation indicates which legislation mandates the publication of the dataset, the legal basis property (also in Datasets) described the legal basis for initial collection and processing of (personal) data.
code values Health classifications and their codes associated with the dataset healthdcatap:hasCodeValues skos:Concept (IRI) 0..n Inside this property, you can provide the coding system of the dataset in the form of wikidata URI (example: https://www.wikidata.org/entity/P494 for ICD-10 ID) and the URI of the value that describes the dataset (example: https://icd.who.int/browse10/2019/en#/Y59.0 for viral vaccines)
coding system Coding systems in use (ex: ICD-10-CM, DGRs, SNOMED=CT, ...) healthdcatap:hasCodingSystem dct:Standard (IRI) 0..n This property provides informatio on which coding systems are in use inside your dataset. For this, wikidata URIs must be used.
conforms to An established standard to which the described resource conforms. dct:conformsTo dct:Standard (IRI) 0..n If your data conforms to an established standard or specification, use this property to indicate which one. The wikidata URI of the specification must be used. Example: https://www.wikidata.org/wiki/Q19597236 for FHIR.
contact point Relevant contact information for the cataloged resource. dcat:contactPoint vcard:Kind 1 This property points to a contact point (department or person) that can answer questions about the dataset. Details on how to describe these are provided under class vcard:Kind.
Whenever possible, use a general contact information (for example from a department) instead of contact information of an individual.
creator An entity responsible for making the resource. dct:creator foaf:Agent 1..n This property points to a person (known as Agent) responsible for generating the dataset. In most cases, this should be the project’s Principal Investigator, provided they consent to being listed in the catalogue. If not, the associated department or institute may be specified instead.
description An account of the resource. dct:description rdfs:Literal 1..n Brief description of the dataset. You can repeat this property in multiple languages. Example: ''Collection of physiological data of Healthy Brain Study participants. This collection includes measurements via biowearables for heart rate, oxygenation, systolic and diastolic measures and stress levels.''
distribution An available Distribution for the Dataset. dcat:distribution dcat:Distribution (IRI) 0..n Metadata element used as a key link to the class Distribution.
documentation A page or document about this thing. foaf:page foaf:Document (IRI) 0..n The value of this property is the IRI directing to the web-page or document about the dataset.
frequency The frequency with which items are added to a collection. dct:accrualPeriodicity skos:Concept (IRI) 0..1 The value of this property should be the IRI from the listed controlled vocabulary indicating the frequency at which the dataset is updated.
For example: http://publications.europa.eu/resource/authority/frequency/WEEKLY
geographical coverage Spatial characteristics of the resource. dct:spatial dct:Location (IRI) 0..n The EU Vocabularies Name Authority Lists must be used for continents, countries and places that are in those lists; if a particular location is not in one of the mentioned Named Authority Lists, Geonames URIs must be used. For districts or neighbourhoods in NL, the Dutch vocab can be used. However, it might in many cases be desirable to keep the geographical coverage broader (eg. indicating that NL is covered), to not expose detailed information of subject’s locations.
has version This resource has a more specific, versioned resource. dcat:hasVersion dcat:Dataset (IRI) 0..n Indicate the dataset which is the other version of the current dataset.
health theme A category of the Dataset or tag describing the Dataset. healthdcatap:healthTheme skos:Concept (IRI) 0..n This property is a structured way to tag the dataset with different health themes. This could include, for example, the specific disease the dataset is about. More details can be provided, if desirable, in the keywords property. Current status: the HealthDCAT-AP working group is currently exploring is other sources (ontologies, thesauri) can be used for this, next to Wikidata. To access Wikidata, click on the link in the controlled vocabulary column and search for your desired theme there.
identifier An unambiguous reference to the resource within a given context. dct:identifier rdfs:Literal 1 Current status: Health-RI is currently working on a strategy for persistant identifiers for, among other things, datasets. Until a solid solution has been found, we propose the following temporary solution:
If your data is published in a repository, fill in with the provided DOI (Example: https://doi.org/10.34894/ZLOYOJ). If not, use the identifier of the dataset as generated in the FAIR data point (FDP). Ensure that metadata is updated if your situation changes.
in series A dataset series of which the dataset is part. dcat:inSeries dcat:DatasetSeries (IRI) 0..n This property points to which Dataset Series the Dataset is part of.
is referenced by A related resource that references, cites, or otherwise points to the described resource. dct:isReferencedBy rdfs:Resource (IRI) 0..n The value of this property is the IRI of the doi to the publication or other related resource.
For example: https://doi.org/10.1186/s13690-021-00709-x
keyword A keyword or tag describing the resource. dcat:keyword rdfs:Literal 1..n Add keywords to increase dataset discoverability. You can include keywords in different languages, submitting each keyword as a separate entry. Example: Physiological measures, Heart Rate, Stress Measures.
language A language of the resource. dct:language dct:LinguisticSystem (IRI) 0..n The language of the Dataset. For this property, the values from the EU Vocabularies Languages Named Authority List must be used. If your Dataset contains multiple languages, this property can be repeated.
legal basis Indicates use or applicability of a Legal Basis. dpv:hasLegalBasis dpv:LegalBasis (IRI) 0..n The legal basis can be provided as a value from the dpv taxonomy (see Controlled vocabulary column).

While the applicable legislation indicates which legislation mandates the publication of the dataset, the legal basis property described the legal basis for initial collection and processing of (personal) data.

Example value for this property could be: dpv:Consent
maximum typical age Maximum typical age of the population within the dataset healthdcatap:maxTypicalAge xsd:nonNegativeInteger 0..1 The approximate maximum age of subjects in the dataset, if applicable. Approximate age is given to protect potentially sensitive information of subjects in the dataset.
minimum typical age Minimum typical age of the population within the dataset healthdcatap:minTypicalAge xsd:nonNegativeInteger 0..1 The approximate minimum age of subjects in the dataset, if applicable. Approximate age is given to protect potentially sensitive information of subjects in the dataset.
modification date Date on which the resource was changed. dct:modified xsd:dateTime 0..1 This property indicates changes to the dataset, not the metadata record. An absent value may mean the resource hasn’t changed since publication, the modification date is unknown, or the resource is continuously updated.
number of records Size of the dataset in terms of the number of records healthdcatap:numberOfRecords xsd:nonNegativeInteger 0..1 Number of records inside a Dataset.
number of unique infividuals Number of records for unique individuals. healthdcatap:numberOfUniqueIndividuals xsd:nonNegativeInteger 0..1 This property is not mandatory, since not all datasets might include data from individuals.
other identifier Links a resource to an adms:Identifier class. adms:identifier adms:Identifier 0..n Examples for secondary identifiers are MAST/ADS, DataCite, DOI, EZID or W3ID (if not used for the original identifier). This property makes use of another, small class: adms:Identifier, where you provide the identifier and the name of the identifier schema (eg. DOI).
personal data Indicates association with Personal Data. dpv:hasPersonalData dpv:PersonalData (IRI) 0..n The different types of personal information that are collected in the dataset can be indicated with this property. Values can be picked from the dpv taxonomy (see controlled vocabulary column).
For example: dpv-pd:Gender
population coverage A definition of the population within the dataset healthdcatap:populationCoverage rdfs:Literal 0..n This field is a free text description of the population covered in the dataset. For example, "Adults aged 18–65 diagnosed with type 2 diabetes in the Netherlands between 2015 and 2020".
publisher An entity responsible for making the resource available. dct:publisher foaf:Agent 1 This property identifies the organisation or individual responsible for making the dataset available. For datasets, this is typically the employer of the data creators. In simple cases, the dataset publisher may be the same as the catalog publisher. In more complex settings, such as when datasets come from multiple institutions within a consortium, the consortium should be listed as the publisher where possible. If no formal consortium can be specified, provide the information of the contributing organizations or individuals under dct:creator instead. For more details, refer to the Agent class.
purpose Indicates association with Purpose. dpv:hasPurpose dpv:Purpose (IRI) 0..n One (or many) category or sub-category of the purposes can be chosen from the taxonomy provided by dpv (see controlled vocabulary column).
Example value could be: dpv:ResearchAndDevelopment.
qualified attribution Attribution is the ascribing of an entity to an agent. prov:qualifiedAttribution prov:Attribution 0..n This property makes use of another small class (prov:Attribution). There, you can choose one of the roles as listed in the controlled vocabulary and link that to a specific Agent (expressed with foaf:Agent). Note that for HealthDCAT-AP, the list of roles might be extended in the future.
Example: https://standards.iso.org/iso/19115/resources/Codelists/gml/CI_RoleCode.xml#processor

Use this property if you would like to indicate the funder of the (research project that resulted in creation of the) dataset.
The value for role then becomes: https://standards.iso.org/iso/19115/resources/Codelists/gml/CI_RoleCode.xml#funder
qualified relation Link to a description of a relationship with another resource. dcat:qualifiedRelation dcat:Relationship 0..n This property makes use of another small class (dcat:Relationship), in which you can indicate the related resource (via its identifier) and the nature of the relation (based on a controlled vocabulary, which is described in the information of the class).
quality annotation Refers to a quality annotation. dqv:hasQualityAnnotation dqv:QualityCertificate 0..n This property makes use of another small class (dqv:QualityCertificate), in which you indicate the IRI of the quality certificate, linked to the described resource (via the identifier of the dataset). See that class for more information.
release date Date of formal issuance of the resource. dct:issued xsd:dateTime 0..1 This property should point to the first known date of issuance, such as the publication date in a data repository. Example: 2023-12-10T13:16:10.246Z.
retention period A temporal period which the dataset is available for secondary use. healthdcatap:retentionPeriod dct:PeriodOfTime 0..1 This property makes use of the class dct:PeriodOfTime, in which a start and end date should be provided.
sample Links to a sample of an Asset (which is itself an Asset). adms:sample dcat:Distribution (IRI) 0..n This property makes use of the dcat:Distribution class to describe a sample distribution of the dataset, which can be anonymized or synthetic data, or the data dictionary provided in CSVW format. This is currently further developed by the TEHDAS2 program. More information can be found here: https://healthdcat-ap.github.io/#sample-distribution
source A related resource from which the described resource is derived. dct:source dcat:Dataset (IRI) 0..n Indicate the dataset on which this described dataset is based.
status The status of the Asset in the context of a particular workflow process. adms:status skos:Concept (IRI) 0..1 This property makes use of a controlled vocabulary to indicate the status of the described dataset.
For example: http://publications.europa.eu/resource/authority/dataset-status/COMPLETED
temporal coverage Temporal characteristics of the resource. dct:temporal dct:PeriodOfTime 0..n The start and end date of the period that the dataset covers. This property makes use of a small class: Period of Time, in which a start and end date can be given.
temporal resolution Minimum time period resolvable in the dataset. dcat:temporalResolution xsd:duration 0..1 If the dataset is a time-series, this should correspond to the spacing of items in the series. For other kinds of dataset, this property will usually indicate the smallest time difference between items in the dataset. The time period has to be provided in the xsd:duration format.
theme A main category of the resource. A resource can have multiple themes. dcat:theme skos:Concept (IRI) 1..n This property should use a controlled vocabulary. In the Health Data Catalogue, all datasets will use the theme 'HEAL' (http://publications.europa.eu/resource/authority/data-theme/HEAL), but additional values from the same vocabulary are allowed.
title A name given to the resource. dct:title rdfs:Literal 1..n Provide a unique title for your Dataset, which can be repeated in multiple languages. Example: Healthy Brain Study - Physiological Data
type The nature or genre of the resource. dct:type skos:Concept (IRI) 0..n A recommended controlled vocabulary data-type is foreseen. Health datasets with personal information must use 'personal data'. This list supports dataset categorization for the EU Open Data Portal. Currently, 'PERSONAL_DATA' is not included in the EU vocabulary and cannot be filled out.
version The version indicator (name or identifier) of a resource. dcat:version rdfs:Literal 0..1 Suggested practice: track major_version.minor_version. Register a new identifier for major changes (e.g., 1.0.0 for an unchanged dataset).
version notes A description of changes between this version and the previous version of the Asset. adms:versionNotes rdfs:Literal 0..n Provide a short description of changes made to the dataset from the previous version.
was generated by Generation is the completion of production of a new entity by an activity. This entity did not exist before generation and becomes available for usage after this generation. prov:wasGeneratedBy prov:Activity (IRI) 0..n NaN

6.2.1. Data Service

Class name HealthDCAT-AP Definition Usage Note URI
Data Service A collection of operations that provides access to one or more datasets or data processing functions. A Data service offers the possibility to access and query the data of one (or several datasets) through operations. It offers more extensive possibilities to access the data than the Distribution through a variety of potential actions. An example of a Data Service is a Beacon API to query genomics data. dcat:DataService
Property label Definition Property URI Range Cardinality Usage note
access rights Information about who access the resource or an indication of its security status. dct:accessRights dct:RightsStatement (IRI) 1 Information that indicates whether the Dataset is publicly accessible, has access restrictions or is not public. This property is required to adopt one of the predefined values listed in the Access Rights Named Authority List provided by the Publications Office. This designation informs data users whether the dataset is considered open data or is classified as non-public. For example, for non-public data, use the value: http://publications.europa.eu/resource/authority/access-right/NON_PUBLIC
applicable legislation The legislation that is applicable to this resource. dcatap:applicableLegislation eli:LegalResource (IRI) 0..n NaN
application profile An established standard to which the described resource conforms. dct:conformsTo dct:Standard (IRI) 0..n The standards referred here SHOULD describe the Data Service and not the data it serves. The latter is provided by the dataset with which this Data Service is connected. For instance the data service adheres to the OGC WFS API standard, while the associated dataset adheres to the INSPIRE Address data model.
contact point Relevant contact information for the cataloged resource. dcat:contactPoint vcard:Kind 1 This property points to a contact point (department or person) that can answer questions about the data service. Details on how to describe these are provided under class vcard:Kind.
Whenever possible, use a general contact information (for example from a department) instead of contact information of an individual.
creator An entity responsible for making the resource. dct:creator foaf:Agent 0..n Note that the Health-RI model diverges from DCAT-AP NL here, which reduces the maximum number of creators to 1. The Health-RI model allows specification of multiple creators of a data service.
rights Information about rights held in and over the resource. dct:rights dct:RightsStatement (IRI) 0..n NaN
description An account of the resource. dct:description rdfs:Literal 1..n Briefly describe the data service provided. You can repeat this description in multiple languages. Example: A data service that provides API access to real-time electrocardiogram (ECG) monitoring data for clinical research applications.
end point description A description of the services available via the end-points, including their operations, parameters etc. dcat:endpointDescription rdfs:Resource  (IRI) 1 Provides technical documentation that explains how to access and interact with the data service’s endpoint.
end point URL The root location or primary endpoint of the service (a Web-resolvable IRI). dcat:endPointURL rdfs:Resource  (IRI) 1 Provide the URL of the endpoint that users can interact with to access the data service. This should be a direct link to the service’s endpoint, such as an API URL, SPARQL endpoint, or similar.
format The file format, physical medium, or dimensions of the resource. dct:format dct:MediaTypeOrExtent (IRI) 0..n NaN
HVD Category A data category defined in the High Value Dataset Implementing Regulation. dcatap:hvdCategory skos:Concept (IRI) 0..n NaN
identifier An unambiguous reference to the resource within a given context. dct:identifier rdfs:Literal 1 Provide a unique identifier for the data service. This could be a globally unique and persistent identifier, such as a DOI, URN, or UUID. If no persistent identifier is available, you may use the accessURL or endpointURL as the identifier, provided it is stable and unique to the service.
keyword A keyword or tag describing the resource. dcat:keyword rdfs:Literal 0..n NaN
landing Page A Web page that can be navigated to in a Web browser to gain access to the catalog, a dataset, its distributions and/or additional information. dcat:landingPage foaf:Document (IRI) 0..n It is intended to point to a landing page at the original data service provider, not to a page on a site of a third party, such as an aggregator.
language A language of the resource. dct:language dct:LinguisticSystem (IRI) 0..n Indicates the natural language used in the data service, indicated with a value from the EU controlled vocabulary.
licence A legal document giving official permission to do something with the resource. dct:license dct:LicenseDocument (IRI) 1 For public data, use a Creative Commons (CC) license (see Geonovum options in the Controlled Vocabulary column). For most National Health Data Catalogue data services, where data is not public, use the 'not open' license from Geonovum and specify data reuse agreements in the dct:rights property.
modification date Date on which the resource was changed. dct:modified xsd:dateTime 0..1 This property indicates the date of the last changes to the dataset, not the metadata record. An absent value may mean the resource hasn’t changed since publication, the modification date is unknown, or the resource is continuously updated.
other identifier Links a resource to an adms:Identifier class. adms:identifier adms:Identifier 0..n NaN
publisher An entity responsible for making the resource available. dct:publisher foaf:Agent 1 The organization or individual responsible for making the data service available. In the context of data services, the publisher is typically the organization that manages or provides access to the service. For details, see the class Agent.
serves dataset A collection of data that this data service can distribute. dcat:servesDataset dcat:Dataset (IRI) 0..n This property connects the Data Service class to its corresponding dataset(s), ensuring every data service links to at least one dcat:Dataset. While essential for metadata implementation teams on each node, it’s less relevant for researchers to collect.
theme A main category of the resource. A resource can have multiple themes. dcat:theme skos:Concept (IRI) 1..n This property should use a controlled vocabulary. In the Health Data Catalogue, most data services will use NAL:data-theme 'HEAL', but additional values from the same vocabulary are allowed.
title A name given to the resource. dct:title rdfs:Literal 1..n Provide a unique title for your data service, which can be repeated in multiple languages. Example: Patient counts per available diagnosis

6.2.2. Dataset Series

Class name HealthDCAT-AP Definition Usage Note URI
Dataset Series A collection of datasets that are published separately, but share some characteristics that group them. A Dataset Series is a collection of similar datasets that are somehow interrelated but published separately. An example is consecutive datasets split by year and/or datasets separated by location. Instead of being made available in a single dataset, the individual (e.g. yearly) datasets are linked together with the Dataset Series class. dcat:DatasetSeries
Property label Definition Property URI Range Cardinality Usage note
applicable legislation The legislation that is applicable to this resource. dcatap:applicableLegislation eli:LegalResource (IRI) 0..n The legislation that mandates the creation or management of the Dataset Series.
contact point Relevant contact information for the cataloged resource. dcat:contactPoint vcard:Kind 1..n This property points to a contact point (department or person) that can answer questions about the dataset series. Details on how to describe these are provided under class vcard:Kind.
Whenever possible, use a general contact information (for example from a department) instead of contact information of an individual.
description An account of the resource. dct:description rdfs:Literal 1..n Briefly describe the dataset series in the catalog. You can repeat this in multiple languages.
frequency The frequency with which items are added to a collection. dct:accrualPeriodicity skos:Concept (IRI) 0..1 The frequency of a dataset series is not equal to the frequency of the dataset in the collection.
geographical coverage Spatial characteristics of the resource. dct:spatial dct:Location (IRI) 0..n The EU Vocabularies Name Authority Lists must be used for continents, countries and places that are in those lists; if a particular location is not in one of the mentioned Named Authority Lists, Geonames URIs must be used. For districts or neighbourhoods in NL, the Dutch vocab can be used. However, it might in many cases be desirable to keep the geographical coverage broader (eg. indicating that NL is covered), to not expose detailed information of subject’s locations.
modification date Date on which the resource was changed. dct:modified xsd:dateTime 0..1 This is not equal to the most recent modified dataset in the collection of the dataset series.
publisher An entity responsible for making the resource available. dct:publisher foaf:Agent 0..1 The publisher of the dataset series may not be the publisher of all datasets.  E.g. a digital archive could take over the publishing of older datasets in the series.
release date Date of formal issuance of the resource. dct:issued xsd:dateTime 0..1 The moment when the dataset series was established as a managed resource. This is not equal to the release date of the oldest dataset in the collection of the dataset series.
temporal coverage Temporal characteristics of the resource. dct:temporal dct:PeriodOfTime 0..n When temporal coverage is a dimension in the dataset series then the temporal coverage of each dataset in the collection should be part of the temporal coverage. In that case, an open ended value is recommended, e.g. after 2012.
title A name given to the resource. dct:title rdfs:Literal 1..n Provide a unique title for your Dataset Series, which can be repeated in multiple languages.

6.2.3. Distribution

Class name HealthDCAT-AP Definition Usage Note URI
Distribution A physical embodiment of the Dataset in a particular format. Used to describe the different ways that a single dataset can be made available. I.e., it can be downloaded or it can be accessed online in one or more distributions (e.g. one in a downloadable .csv file, another file with an access or query webpage). dcat:Distribution
Property label Definition Property URI Range Cardinality Usage note
access service A data service that gives access to the distribution of the dataset. dcat:accessService dcat:DataService 0..1 This property links the distribution class to the corresponding data service(s).
access url A URL of the resource that gives access to a distribution of the dataset. E.g., landing page, feed, SPARQL endpoint. dcat:accessURL rdfs:Resource (IRI) 1 Add a link that points to where the dataset can be found. If it’s hosted in a Data Repository, include the link to its entry. For datasets not in a repository (like registries), but still available for secondary use, provide a link to an access request form or a webpage with instructions for accessing the data.
applicable legislation The legislation that is applicable to this resource. dcatap:applicableLegislation eli:LegalResource (IRI) 0..n The legislation that mandates the creation or management of the Distribution.
byte size The size of a distribution in bytes. dcat:byteSize xsd:nonNegativeInteger 1 Describes the size of the distribution (the actual file) in bytes, and is therefore expressed as a non-negative integer. If the actual size is not know, it can be estimated.
checksum The checksum property provides a mechanism that can be used to verify that the contents of a file or package have not changed. spdx:checksum spdx:Checksum 0..1 The checksum is related to the downloadURL. This property makes use of the spdx:Checksum class, which itself has two properties to indicate checksum algorithm and checksum value (see Checksum class for further details).
compression format The compression format of the distribution in which the data is contained in a compressed form, e.g., to reduce the size of the downloadable file. dcat:compressFormat dct:MediaType (IRI) 0..1 It MUST be expressed using a media type as defined in the official register of media types managed by IANA.
description An account of the resource. dct:description rdfs:Literal 0..n Provide specific details about the distribution here, complementing the description of the related Dataset. This field can be repeated for different language versions of the description.
documentation A homepage for some thing. foaf:page foaf:Document (IRI) 0..n This page can contain additional information about the distribution.
download URL The URL of the downloadable file in a given format. E.g., CSV file or RDF file. The format is indicated by the distribution’s dcterms:format and/or dcat:mediaType. dcat:downloadURL rdfs:Resource (IRI) 0..1 If the dataset is openly accessible and available in a repository, you can directly include a link to the downloadable file here.
format The file format, physical medium, or dimensions of the resource. dct:format dct:MediaTypeOrExtent (IRI) 1 This property can be used to describe a media format in more detail than "media type" (using IANA media type) when needed. Instances of this property should use a URL, e.g., from the File Type vocabulary. For instance, for mzML files the value of this property could be: http://edamontology.org/format_3244
language A language of the resource. dct:language dct:LinguisticSystem (IRI) 0..n Indicates the natural language used in the Distribution, indicated with a value from the EU controlled vocabulary. Not all distributions might have a language, for example imaging data.
Note that here the Health-RI model diverges from DCAT-AP NL, which allows maximum of 1 languages per Distribution. The Health-RI model allows multiple languages in the same Distribution.
license A legal document giving official permission to do something with the resource. dct:license dct:LicenceDocument (IRI) 1 For public data, use a Creative Commons (CC) license (see Geonovum options in the Controlled Vocabulary column). For most National Health Data Catalogue distributions, where data is not public, use the 'not open' license from Geonovum and specify data reuse agreements in the dct:rights property.
linked schemas An established standard to which the described resource conforms. dct:conformsTo dct:Standard (IRI) 0..n This property SHOULD be used to indicate the model, schema, ontology, view or profile that this representation of a dataset conforms to, in a machine-readable form. This is (generally) a complementary concern to the media-type or format. Use a reference to the official publication of the respective schema.
media type The media type of the distribution as defined by IANA. dcat:mediaType dct:MediaType (IRI) 0..1 Use the specified vocabularies, prioritizing IANA media types whenever possible. If unavailable, consider other ontologies, such as ZonMw generic terms, to descibe the format.
Example: https://www.iana.org/assignments/media-types/text/csv (for csv)
If IANA media types do not sufficiently describe the format, use "format" to describe it in more detail.
modification date Date on which the resource was changed. dct:modified xsd:dateTime 0..1 NaN
packaging format The package format of the distribution in which one or more data files are grouped together, e.g., to enable a set of related files to be downloaded together. dcat:packageFormat dct:MediaType (IRI) 0..1 It SHOULD be expressed using a media type as defined in the official register of media types managed by IANA.
release date Date of formal issuance of the resource. dct:issued xsd:dateTime 0..1 The date the dataset distribution was issued.
retention period A temporal period which the dataset is available for secondary use. healthdcatap:retentionperiod dct:PeriodOfTime 0..1 This property makes use of the class dct:PeriodOfTime, in which a start and end date should be provided.
rights Information about rights held in and over the resource. dct:rights dct:RightsStatement (IRI) 1 A statement that concerns all rights not addressed in fields License or Access rights. In case of not open data (as specified in the dct:licence property), further agreements regarding data reuse (eg. Data Transfer Agreement, DTA,) should be stated in this property.
The rights statement should be a free-text statement placed at an web-accessible location such that the value of this property is the IRI pointing to that statement.
Current status: This recommendation on how to state data transfer/reuse conditions will be pilotted in 2025.
status The status of the Asset in the context of a particular workflow process. adms:status skos:Concept (IRI) 0..1 It MUST take one of the values Completed, Deprecated, Under Development, Withdrawn from the provided controlled vocabulary.
temporal resolution Minimum time period resolvable in the dataset. dcat:temporalResolution xsd:duration 0..1 If applicable, this property indicates the minimum time period resolvable in the dataset distribution, expressed in xsd:duration format (see for more information here: https://www.w3schools.com/xml/schema_dtypes_date.asp)
title A name given to the resource. dct:title rdfs:Literal 1..n A title given to the distribution. e.g., Data Access Request of Healthy Brain study

7. Supportive Classes

Supportive Classes provide additional context and metadata. They enhance discoverability.

7.1. Mandatory

7.1.1. Agent

Class name HealthDCAT-AP Definition Usage Note URI
Agent Any entity carrying out actions with respect to the (Core) entities Catalogue, Datasets, Data Services and Distributions. A person or organisation that is associated with the catalogue or dataset. This class is instantiated in these classes whenever the range is foaf:Agent. foaf:Agent
Property label Definition Property URI Range Cardinality Usage note
country Spatial characteristics of the resource. dct:spatial dct:Location (IRI) 0..n Use the appropriate term from the EU authority table. Example for the Netherlands: http://publications.europa.eu/resource/authority/country/NLD
email A email address via which contact can be made. This property SHOULD be used to provide the email address of the Agent, specified using fully qualified mailto: URI scheme [ RFC6068 ]. The email SHOULD be used to establish a communication channel to the agent. foaf:mbox rdfs:Resource  (IRI) 1 It is preferred to provide a general email address of the appropriate institution or department instead of a personal email address, even if an individual is described with this instance of foaf:Agent.
This protects their privacy and enables contact about the resource even after the individual has left the institution.

The email address has to be provided starting with mailto: prefix.
For example: mailto:info@example.com
identifier An unambiguous reference to the resource within a given context. dct:identifier rdfs:Literal 1..n Specify the entity (person or organization) by providing an identifier. We recommend using an ORCID identifier for a person or ROR identifier for organization. Dutch governmenta organization are listed in TOOI. If these are not available, you can use ISNI, or Wikidata or any other identifier you may have. If you have multiple identifiers, you should provide them all.
name A name for some thing. foaf:name rdfs:Literal 1..n This property refers to the given name of the entity. Example: Jane Doe (for a person) and Radboudumc (for an organization). This property can be repeated for different versions of the name (e.g. the name in different languages).
publisher note A description of the publisher activities. healthdcatap:publishernote rdfs:Literal 0..1 This property can be repeated for parallel language versions of the publisher notes. Example: "Sciensano is a research institute and the national public health institute of Belgium. It is a so-called federal scientific institution that operates under the authority of the federal minister of Public Health and the federal minister of Agriculture of Belgium."@en
publisher type A type of organisation that makes the Dataset available. healthdcatap:publishertype skos:Concept (IRI) 0..1 Current status: Specifically for the health domain, a controlled vocabulary is being developed to include commonly recognised health publishers. This vocabulary is currently under development. Version 1.0 includes the following types: Academia-ScientificOrganisation, Company, IndustryConsortium, LocalAuthority, NationalAuthority, NonGovernmentalOrganisation, NonProfitOrganisation, PrivateIndividual, RegionalAuthority, StandardisationBody and SupraNationalAuthority. These should use the following URL: http://purl.org/adms/publishertype/[type].
type The nature or genre of the resource. dct:type skos:Concept (IRI) 0..1 This property should be filled using ADMS vocabulary (http://purl.org/adms/publishertype/1.0)
URL A homepage for some thing. foaf:homepage rdfs:Resource (IRI) 1 Provide the URL of the page containing contact information, such as a contact form or details for reaching out. If a specific contact page is unavailable, the main website of the Agent is sufficient.

7.1.2. Kind

Class name HealthDCAT-AP Definition Usage Note URI
Kind A description following the vCard specification. Used to describe contact information for Dataset and DatasetSeries. This class is instantiated in these classes whenever the range is vcard:Kind. vcard:Kind
Property label Definition Property URI Range Cardinality Usage note
contact page To specify a uniform resource locator associated with the object. vcard:hasURL rdfs:Resource  (IRI) 0..n A webpage that either allows to make contact (i.e. a webform) or the information contains how to get into contact.
has email To specify the electronic mail address for communication with the object. vcard:hasEmail rdfs:Resource  (IRI) 1 When naming a contact point this information needs to be further specified with additional information, i.e., an email address. This email address does not need to be a direct contact to the person responsible for the management of the data, it could be a generic information email.
The email address has to be provided starting with mailto: prefix.
For example: mailto:info@example.com / mailto: jane.doe@example.com
formatted name The formatted text corresponding to the name of the object. vcard:fn xsd:string 1 Provide the full name of the contact point, such as the name of a person or department responsible for communication.

7.2. Recommended

7.2.1. Attribution

Class name HealthDCAT-AP Definition Usage Note URI
Attribution Attribution is the ascribing of an entity to an agent. This class is instantiated by the property "qualified attribution" (prov:qualifiedAttribution) in other classes. Use this class to describe any Agent (other than publisher or creator) that has some form of responsibility for the resource. Within the class, this Agent is described with an instance of foaf:Agent, and the role is chosen from a controlled vocabulary. This class can be used to indicate the funding agent that provided funding for the dataset. prov:Attribution
Property label Definition Property URI Range Cardinality Usage note
agent The prov:agent property references an prov:Agent which influenced a resource. prov:agent foaf:Agent 0..1 This property points to another instance of class foaf:Agent.
role The function of an entity or agent with respect to another entity or resource. dcat:hadRole rdfs:Resource (IRI) 0..1 Choose one of the roles as listed in the controlled vocabulary. Note that for HealthDCAT-AP, the list of roles might be extended in the future.
Example: https://standards.iso.org/iso/19115/resources/Codelists/gml/CI_RoleCode.xml#processor

7.2.2. Checksum

Class name HealthDCAT-AP Definition Usage Note URI
Checksum A value that allows the contents of a file to be authenticated. This class is instantiated by properties in other classes that have the range spdx:Checksum. spdx:Checksum
Property label Definition Property URI Range Cardinality Usage note
algorithm Identifies the algorithm used to produce the subject Checksum. spdx:algorithm spdx:ChecksumAlgorithm (IRI) 1 Choose one member of the checksum algorithm members as indicated on the webpage linked in the Controlled Vocabulary column.
checksum value The checksumValue property provides a lower case hexidecimal encoded digest value produced using a specific algorithm. spdx:checksumValue rdfs:Literal 1 NaN

7.2.3. Identifier

Class name HealthDCAT-AP Definition Usage Note URI
Identifier An identifier in a particular context, consisting of the string that is the identifier; an optional identifier for the identifier scheme; an optional identifier for the version of the identifier scheme; an optional identifier for the agency that manages the identifier scheme. This class is instantiated by the property "other identifier" (adms:identifier) in other classes. Use this class to provide any additional identifier to the resource or dataset that is not the primary identifier provided in dct:identifier. adms:Identifier
Property label Definition Property URI Range Cardinality Usage note
notation A string that is an identifier in the context of the identifier scheme referenced by its datatype. skos:notation xsd:string 1..1 The value of this property is the alternative identifier of the dataset, next to the one indicated in the dct:identifier property.
schema agency The name of the agency that issued the identifier. adms:schemaAgency xsd:string 0..1 NaN

7.2.4. Period of time

Class name HealthDCAT-AP Definition Usage Note URI
Period of Time An interval of time that is named or defined by its start and end dates. This class is instantiated by properties in other classes that have the range dct:PeriodOfTime. dct:PeriodOfTime
Property label Definition Property URI Range Cardinality Usage note
end date The end of the period. dcat:endDate xsd:dateTime 0..1 NaN
start date The start of the period. dcat:startDate xsd:dateTime 0..1 NaN

7.2.5. Quality certificate

Class name HealthDCAT-AP Definition Usage Note URI
Quality Certificate An annotation that associates a resource (especially, a dataset or a distribution) to another resource (for example, a document) that certifies the resource’s quality according to a set of quality assessment rules. This class is instantiated by the property "quality annotation" (dqv:hasQualityAnnotation) in other classes. Use this class to provide a link between the resource or dataset and an associated quality annotation. dqv:QualityCertificate
Property label Definition Property URI Range Cardinality Usage note
target The relationship between an Annotation and its Target. oa:hasTarget rdfs:Resource (IRI) 0..1 This property has to be filled with the same value as the dct:identifier of the dataset described, in order to link the quality certificate to that dataset.
See also example in HealthDCAT-AP: https://healthdcat-ap.github.io/#dqvhasqualityannotation
body The object of the relationship is a resource that is a body of the Annotation. oa:hasBody rdfs:Resource (IRI) 0..1 IRI pointing to the location where the quality certificate can be found.

7.2.6. Relationship

Class name HealthDCAT-AP Definition Usage Note URI
Relationship An association class for attaching additional information to a relationship between DCAT Resources. This class is instantiated by the property "qualified relation" (dcat:qualifiedRelation) in other classes. Use this class to describe a relationship with another resource or dataset. Within the class, that resource is indicated, as well as the role this resource has in relation to the described one. The role is indicated based on a controlled vocabulary. dcat:Relationship
Property label Definition Property URI Range Cardinality Usage note
had role The function of an entity or agent with respect to another entity or resource. dcat:hadRole skos:Concept (IRI) 1..n Specify, ideally with a value from the linked controlled vocabulary, the nature of the relationship between the linked resources.
Example: http://www.iana.org/assignments/relation/related
relation A related resource. dct:relation rdfs:Resource (IRI) 1..n This property establishes the link between the described and the related resources. The value of this property is the IRI of the related resource.