MRTGv08 Type term inconsistent with DwC

From KeyToNature
Jump to: navigation, search

Information about submission as TDWG standard
MRTG Wiki Homepage
Current Schema Draft. This version is under internal review as part of the submission to TDWG.
Audubon Core Non normative document
MRTG Development History
MRTG Meeting Notes
MRTG Best Practices
XML Schema representation of Audubon Core
RDF representation of Audubon Core
MediaWiki Help

This page for discussion of inconsistency in recommendations for the Type term noticed by Steve Baskauf 05:34, 14 October 2009 (CEST)



1 Summary of the problem

Propose one sentence summaries here.

  • When applied to a publication artifact such as an image or drawing, Dublin Core uses dcterms:type to describe the type of artifact, whereas DarwinCore uses it to describe the type of the content.--BobMorris 14:55, 23 October 2009 (CEST)


2 Open letter to TDWG and DarwinCore (2009-10-23)

The letter is documented here to facilitate further in-wiki MRTG discussions.

Dear John, we (Gregor Hagedorn, Bob Morris, Steve Baskauf) realize that the public review period for DarwinCore (DwC) is over, but we believe we need to bring a potentially highly problematic issue to your attention. This issue has been found originally by Steve Baskauf. Essentially, it is an issue that is not very appearant when reading DarwinCore for review, but detected when trying to implement it in combination with other technologies.

DarwinCore seems to use dcterms:type in a way that is inconsistent with the DublinCore (DC) recommendations for publication artifacts, which is the way most users of DC are likely to use dcterms:type. Steve pointed out that MRTG's use, which does follow the DC recommendation, is inconsistent with DwC. We believe that this is not a problem of MRTG; the problem equally occurs, e. g., where natural history collections collaborate with the culture and library initative Europeana.eu, which equally uses DublinCore type in the original sense.

DublinCore dcterms:type has an explicit type vocabulary: http://dublincore.org/documents/dcmi-terms/#terms-type whose annotations says: "Recommended best practice is to use a controlled vocabulary such as the DCMI Type Vocabulary [DCMITYPE]." This vocabulary: http://dublincore.org/documents/dcmi-type-vocabulary/ defines values like StillImage, Sound, MovingImage, Text.

In contrast, the DwC type vocabulary acts on an abstract level of recording occurrences that are independent of physical records. These occurrences can then be vouchered by physical resources like specimens, photos, movies, etc. The actual resources treated in DublinCore are therefore only potential vouchers for a DarwinCore resource. The terms recommended for DublinCore "type" are therefore expected in the DarwinCore "basisOfRecord" property.

We do not mean to imply that there is anything wrong with the DarwinCore perspective. Unfortunately, we believe that DarwinCore cannot coexist with DublinCore data, as long as DarwinCode does not define its own dwc:type/dwc:abstractType property.


Test case: An image showing a taxon observation shall be documented both in DarwinCore and DublinCore.

  • DarwinCore prescribes or recommends dcterms:type=Occurrence, plus: basisOfRecord:StillImage.
  • DublinCore recommends dcterms:type=StillImage.


We have internally begun to discuss possible solutions. In DublinCore, dcterms:type does not express a particular type of metadata record, but is metadata about the underlying resource. We therefore consider replacing the DwC use of dcterms:type with something in the dwc namespace, and replacing dwc:basisOfRecord with dcterms:type as an option that minimizes the necessary design changes in DwC. We can see some other issues arise that depend on how one tries to bring DwC into closer coherence with the DublinCore recommendations, but perhaps these are best put forth on a wiki.

Here we would like only to point out that we believe that the values for basisOfRecord fit into the dcterm:type vocabulary. Observations (dwc:HumanObservation and dwc:MachineObservation) may be placed as subtypes of http://purl.org/dc/dcmitype/Event. and specimens (dwc:PreservedSpecimen, dwc:FossilSpecimen, dwc:LivingSpecimen) as subtypes of http://purl.org/dc/dcmitype/PhysicalObject. For different communities, the dwc specimen types may have to be further subtyped as "Seed", "TissueSample", "DNA_Sample".

However, we believe it is not possible to create a hierarchy like

 "StillImage - isSubtypeOf - Image - isSubtypeOf - Occurrence"

because this is a use-case dependent view: A character image may be a subtype of a taxon representation, and it may or may not be a subtype of an occurrence representation.


3 References

  1. 1.0 1.1 dcterms = http://dublincore.org/documents/dcmi-terms/
  2. ncd = http://rs.tdwg.org/ontology/voc/Collection#
  3. k2n = http://www.keytonature.eu/std/metadata/2009/xmlns/ - Note: The namespace is not resolvable, the specification can be found here
Name:Type
Normative URI:dcterms:type [1]
 Layer: Core — Required: Yes — Repeatable: No
Definition:Any dcmi type term from http://dublincore.org/documents/dcmi-type-vocabulary/ may be used. Recommended terms are Collection, StillImage, Sound, MovingImage, InteractiveResource, Text.
Comments:A Collection should be given type http://purl.org/dc/dcmitype/Collection. If the resource is a Collection, this item does not identify what types of objects it may contain. Following the DC recommendations at http://purl.org/dc/dcmitype/Text, images of text should be marked as Text.
Crosswalk: DublinCore: dcterms:type [1]XMP:DarwinCore:NCD: ncd:CollectionType[2]Morphbank:NBII: Type — K2N: k2n:Type [3] pro parte — MIX2.0:
Discussion
  • Do we mean to require the dcmi URL, or do we accept these dcmi Labels? --BobMorris 17:38, 14 March 2009 (CET)
  • Now that DwC has been accepted as a TDWG standard, I have returned to a previous task, which was to try to hammer out a schema for the SERNEC plant image collection. This collection will integrate live plant images with images from specimens. Thus the schema will mostly be imported from the DwC schema (for specimen metadata) and the MRTG schema (for images) when it is done. Both schemas include the "dcterms:" namespace and accept dcterms:type as the element to identify the class into which the resource falls. However, the problem is that the recommended terms given under the DwC dcterms:type and the terms given here for dcterms:type are in conflict. DwC recommends "Occurrence", "Event", "Location", "Taxon" as terms, while the MRTG recommendations are listed above. If there were no overlap in the function served by images, this would not be a conflict. However, live plant images are records of occurrence just as specimens are and so functionally their object class should be "occurrence" because many of the metadata elements associated with the live plant images will be the same as the specimens. In DwC, the poorly named element "basisOfRecord" (which is defined as "a subtype of dcterms:type") is the functional equivalent of mrtg:subtype and has the recommended terms "StillImage", "MovingImage", "Sound", "PreservedSpecimen", FossilSpecimen", LivingSpecimen", "HumanObservation", "MachineObservation". Thus under DwC, the live plant image should be designated as a still image by setting the value of basisOfRecord to StillImage, and not by using StillImage as the value of dcterms:type. I will grant that the recommended values listed here in the MRTG schema are true to the DCMI type vocabulary. However, that vocabulary is generic while the DwC class designations are specifically tailored to the biodiversity community. This issue is not a trivial one because under the TDWG LSID Minimal Standards, the value of dcterms:type (i.e. the object class) is going to be one of the few metadata elements that all LSID identified resources will have and the value returned for dcterms:type when the LSID is resolved will determine the nature of most of the subsequent metadata provided during the LSID resolution process. To return to my concrete example of a live plant image, when its LSID is resolved, it must first say "I am a primary biodiversity record (occurrence)". Then given that the type is an occurrence, the consuming application can expect to be told that the flavor of occurrence (subtype or basisOfRecord, whichever you prefer) is image rather than specimen or something else and expect to receive additional metadata specific to images. We have to get this right because one of the "important uses" of multimedia metadata stated in the MRTG non-normative document (number 3 to be exact) is "use of metadata records as potential taxon occurrence evidence". Unless the problem is fixed there will be inconsistent use of dcterms:type by people like me who consider their images to primarily be occurrence records and use the DWC suggestions, and others who don't care about that and use the MRTG suggestions. I have a lot more that I could say about this issue because it is critical for me, but in the interest of space I'll restrain myself. The bottom line is that dcterms:type is probably the most important element in both the DwC and MRTG schemas and the acceptable values for it must be consistent. So it is right that dcterms:type is one of the mandatory MRTG elements, but the list of acceptable values for it and whatever you want to call "subtype" or "basisOfRecord" is probably something that needs to be hammered out by the TDWG Technical Architecture Group. Steve Baskauf 05:34, 14 October 2009 (CEST)
  • Many thanks for pointing this out. I agree this is a serious issue. However, the solutions seems to be difficult. DublinCore precedes DarwinCore and is widely accepted. DarwinCore seems to preclude the established use of the DublinCore vocabulary for this element (http://dublincore.org/documents/dcmi-type-vocabulary/). What shall we do? --Gregor Hagedorn 22:20, 15 October 2009 (CEST)
Personal tools