Resource Metadata Exchange Agreement

From KeyToNature
Jump to: navigation, search

THIS IS THE NEWEST, WORKING VERSION. IT NEEDS CLEANUP AT THE START!


CHANGES from version 2008 to 2009-03, all following MRTG decisions

Common Geoarea Name    changed to:   World Region // TEMPLATE OK, FREQUENTLY USED
Lowest Common Taxon    changed to:   Taxonomic Coverage // TEMPLATE OK, FREQUENTLY USED
Vernacular Names       changed to:   Common Names // TEMPLATE OK, FREQUENTLY USED
-> used by Tartu - email
-> already upgraded by PMSL
Homepage should no longer be used (but is still supported in the template where it has been used) // TEMPLATE OK

(Homepage used to be a special field applicable only to providers and - rarely - collections. Provider metadata are collected through a separate agreement now (wiki pages for each provider, containing an info box, check examples like NHM). Where collections have a URL page (other than the wiki collection page itself) simply put the URL into "Page Context URI" .

Exchange Formats -> dropped, now as Exchange 1 Format, Exchange 2 Format, Exchange 3 Format, each accompanied by a URI (Exchange 1 URI, Exchange 2 URI, Exchange 3 URI).

Note: {QL} = one of the quality levels.

Format changed to: {QL} Format
Extent changed to: {QL} Size, {QL} Image Width, {QL} Image Height, {QL} Duration,  
where {QL} is a quality level from Best Quality to Tiny Preview.
-> TEMPLATE NOT DONE !!!
-> Extent USED by PMSL
               -> already upgraded by PMSL
TODO GH: need to add quality-level specific licenses!!!!

IMPORTANT for data providers: Please add an Availability field for every quality level (e.g. Best Quality URI + Best Quality Availability). This is easily overlooked. Please note that the URL-like encoding of login-http, digitally published, etc. used in the first survey is no longer supported! See Resource Availability for Availability values and supported URLs.

Important for Identification keys: Please add the exchange format URIs together with a Format (see above)

Changes in fields not currently used (no action necessary)

Imaging Technique      changed to:   Subtype            // TEMPLATE OK
Peer Reviewer Names    changed to:   Reviewer Names     // TEMPLATE OK
Peer Reviewer Comments changed to:   Reviewer Comments  // TEMPLATE OK
Subject Stage          changed to:   Subject Life Stage // TEMPLATE OK
Subject Direction      changed to:   Subject Orientation // TEMPLATE OK


New fields

Rating   // TABLE OK, TEMPLATE OK
Country Names // TABLE OK, TEMPLATE OK
Identified By  // TABLE OK, TEMPLATE OK
License URL  // TABLE OK, TEMPLATE OK
Attribution Statement  // TABLE OK, TEMPLATE OK
Attribution Logo URL  // TABLE OK, TEMPLATE OK
Attribution Link URL  // TABLE OK, TEMPLATE OK
Page Context URI  // TABLE OK, TEMPLATE OK
Exchange 1 URI  // TABLE OK, TEMPLATE OK
Exchange 2 URI  // TABLE OK, TEMPLATE OK
Exchange 3 URI  // TABLE OK, TEMPLATE OK

TODO in template: add Size, Height/Width, Duration. Check Availability display!


This page represents the continued development of the original Key to Nature Resource Metadata Exchange Agreement EU deliverable D 4.4 together with the GBIF Multimedia Resource Task Group (MRTG), Morphbank, and NBII to form a new common standard.


Key to Nature internal information:

The agreement is currently slightly inconsistent. It partly still refers to the original, file-based exchange, but is partly already modified for a new wiki-based strategy for collection resource metadata, intended to improve outreach and workflow.

Please help removing such inconsistencies, as well as general problems with the plan or its explanation, by adding discussions, comments, or improvement suggestions here on the wiki. They will then be incorporated into the next version.

The first implementation of this agreement on the Wiki is through a template (Template:Metadata, technical page, for examples take a look at the resource collections on any). Metadata may be recorded either in special metadata submission pages, or may directly associated with identification tools on the wiki. (##Example yet missing##)


Summary

One aim of Key to Nature is improving search, accessibility, use and re-use of identification tools themselves as well as images, sounds, videos, or taxon pages involved in the creation of such tools. This is to be achieved by an online search tool making widely distributed digital objects on-line searchable using a single user interface. Rather than merging data in a central repository, only metadata are integrated according to structural and content standards agreed upon in this metadata exchange agreement. Such a integration of metadata is a fairly common use case in digital libraries. A specific task is, however, to analyze which additional metadata concepts ("fields") are required in our use cases. The metadata concepts need to match both the requirements of queries as well as for subsequent query result reporting, which occasionally requires displaying information with more specific semantic than is strictly necessary for discovery purposes.

The metadata exchange agreement defines types of resources: a) institution or organization (“Provider”), b) resource collections, and c) individual resources like “IdentificationTool”, “StillImage”, “TaxonPage”, etc. The metadata for each resource include information about, e.g., title, keywords, taxa, geographic location, copyright, license, format, access type (printed, offline, online, free, login) and URIs. It supports resources available at different quality levels (e. g., high-resolution, web-optimized, or different thumbnail sizes) under different URIs, which is typical for images or sounds. Information in multiple languages (e. g., title in English as well Slovenian) is supported. Information is given which metadata item is required, urgently requested, or optional for which resource type.

The metadata structure employed for data exchange so far is a flat table structure. For the first survey all data could be put into a single table and exchanged through appropriate document-based mechanisms (e. g., tabulator delimited Unicode text files or Microsoft Access databases). However, a new, wiki-based method using the same metadata fields, but a different, wiki-based exchange format is currently being tested and outlined towards the end of this document (see here). We hope to overcome problems in workflow, quality control, distribution of work, and long-term maintainability with this method, while preserving the chance for integrating data providers without advanced IT capabilities. For high-tech data providers, installing xml-based data exchange through custom webservices or standard OAI-PMH methods would be another alternative.


Scope

The purpose of this agreement is to share metadata about identification keys available in digital form and resources that are relevant to the creation and enhancement of such identification keys. The following resource types are considered in this agreement:

  • Identification tools (= identification data plus applications where necessary), ranging from multi-access keys and interactive branching keys to static branching keys (which may be available only as PDF). These resources may be valuable both as (a) primary data forming the core of an eLearning package and (b) as resources used to link to when an identification result is reached (e. g., genus keys may be the results of a family key).
  • Media files (still-images, audio, video) are valuable to illustrate either definition of terms (characters/character states) or their expression in specific taxa.
  • Complex documents (PDF, web pages) describing species or other taxa (“taxon pages”, “species pages”) are valuable as primary or secondary (“see also”) results of other keys.

The purpose of the data sharing is primarily to share information about the resources, not directly to share the resources themselves. In general, the project does not plan to store or provide resources from project servers. As soon as resource sharing is intended (e. g., a common identification tool repository, tiny thumbnails of images, reuse of images in an eLearning package) this requires separate agreements from the main metadata agreement.


The metadata structure proposed here is relatively flat and all data can – if desired – be exchanged in a single table. However, within this table, three types of resources are used to simplify data exchange and avoid unnecessary repetition:

  1. A few metadata items are requested for your entire institution or organization to help the planned system to attribute (or “brand”) information as coming from and belonging to your institution or organization. Note that most metadata elements are not applicable (“–“) for this resource type.
  2. Resources may be grouped into collections. These collections may reflect existing management procedures, different sources or authorship, etc. at your institution. Please view them as an opportunity to make the relation between resources better visible to customers. By allowing a significant amount of information to be given on the collection level, providing the information for individual resources may be less laborious. Where you find collections undesirable or a burden rather than an opportunity (e. g. in the case of your identification keys), simply create a single “catch-all” collection like “identification keys from x.y.” We do need at least one collection for each provider.
  3. The central item are the metadata for each individual resource. Whereas the information on provider and collections is used to supply search results with a context, the single resource metadata allow to find appropriate resources.

To abstract the various methods of expressing multiple values, the term Multi-valued is added to fields that may potentially contain multiple values. For flat-table formats, multiple values should generally be separated by a semicolon. In RDF or xml, the element will often be repeated instead.

For the data exchange using Wiki-template style see here.

Metadata agreement

By sharing metadata like web-address (= “URI”), titles and captions, keywords, taxon names, authorship, copyright and license, access to these resources is improved. Although ideally resources would be available directly on the internet (i. e. have a URI), this is no requirement. Consortium partners will also benefit from information that help to locate resources or resource versions (e. g., high quality images or sounds) that are not available directly, but may be available after direct negotiations.

The agreement tries to be permissive with respect to required data. Instead of “required” we use the term “urgently requested” (code: “R”) to indicate the request to make an effort to try to obtain this information. (Note: In some cases, however, it can be really required to supply a value for a field to make references possible, e.g. for resources with more than one MetadataLanguage or between resource, collection and provider. The field "Type" also is required to handle the data.) Other information is usually optional (code: “o”), some may be not applicable to some resource types (code: “–”). To account for the different resource types, the metadata concepts are annotated in the first columns of the following table using these codes.

In some cases it may be difficult to provide requested information. For example, although it is very important to have copyright and licensing information, it may not be feasible to research them for large resource collections. In such a case it is permissible to use “neutral” statements like “copyright by owner” or “Licenses will be individually negotiated”. The Spanish partner, for example, at the moment uses: "Licenses (except for use within the context of www.rjb.csic.es/floraiberica) must be individually negotiated" to avoid making commitments now. Another form that is essentially just as neutral, but offers may be “Reuse under a cc license will be considered after individual requests”.

Please consider using a Creative Commons license (http://creativecommons.org/license/) to help maintain the traditional sharing of scientific information in an increasingly legalistic world. A common form is the cc by attribution – non-commercial – share-alike license, meaning that the work may be modified and included in other non-commercial works, provided that the license is maintained (i. e. the new work is available to you under the same conditions) and that the source is cited and attributed.


Tabular list of metadata fields

The following table lists a number of metadata field names (“elements”) to agree upon. This is a first attempt to reach an agreement. As outlined below under “Extensibility”, you may add further fields.

The first columns explain the expected applicability of fields to different resource types. Please ask back if the expectations expressed here do not mark your situation. Urgently requested fields are marked “R”, optional fields “o”, fields considered not applicable are marked with (“–“).

The proposal is based on DublinCore, the IPTC/Adobe XMP standards, the discussions in the GBIF and TDWG lead Multimedia Resource Taskgroup (MRTG) as well as specific discussions at the Key2Nature Kick-off meeting in Triest. Please review and criticize it.

(In the narrow columns on the left side the applicability of metadata fields is given with respect to some resource types.)

↓ Applicability of metadata field for data providing institution, organization, or individual: deprecated because data providers must have a Wiki page
↓ Applicability for each collection of identification tools, images, or other resources
↓ Applicability for identification tools, or eLearning packages
↓ Applicability for taxon pages, glossary pages, maps, or data sets
↓ Applicability for media resources (still image/audio/video/etc.)
Metadata fields ("elements"):
Field derived from dc:type:
R
R
R
R
R
Type Provider (fixed value for provider metadata), Collection (fixed value for collections), StillImage, Sound, MovingImage, Map, etc. See accompanying value list further below in this document.
~
o
o
o
o
Subtype A more fine-grained specification of types of resources. For images, this defines an extensions of Dublin Core Resource Type:StillImage. Intended to distinguish between different types of content representation like “line drawing, grayscale drawing, color drawing, grayscale photo, color photo”. Occasionally a distinction between normal, infrared, fluorescent, light-microscopic, TEM (transmission electron microscope) or SEM (scanning electron microscope) photos may also be desirable. Both field name and scope should be further discussed! Under discussion: should this include digitization techniques, or should they be separate?
~
R
Interactivity Applicable only to interactive resources such as identification tools. For such tools we currently use the following vocabulary: “Static”, “Hyperlinked”, and “Dynamic”. Both single-access (= dicho-/poly­to­mous) and multi-access keys may or may not be presented dynamic (also called “interactive” or “adaptive”). “Hyperlinked” is intended for Html or PDF hypertext documents limited to simple links (connecting parts of the key or leading to external resources).
~
R
Offline Use Applicable to various resources, but especially relevant to interactive resources such as identification tools. Many such tools require a constant online database connection, making them unsuitable in situations where no internet connectivity is available. Vocabulary: “yes” or “no”.
~
R
Host Application Applicable primarily to interactive resources: The software necessary to use an interactive resource (e. g., an identification tool), whether this software is distributed with the tool or not. Examples: “Web browser”, “PDF Reader”, “Lucid Player”, “Linnaeus II player”. Use the value “Custom” if the identification tool is uniquely coupled with custom-programmed software that has no independent name. See also the section “Value lists” below for further examples.
~
R
Target System Applicable primarily to interactive resources: The operating system or virtual machine that must be present to use the resource. Example values are: “Java”, “.NET”, “Mac”, “Windows XP and later”, “Linux”, “PDA”, “Smartphone”, “Web” (a special value where only standard web browser functionality is expected); further values may be added as necessary, especially where specific OS versions are targeted. If a single resource is provided for multiple hardware or operating systems, this may be Multi-valued. Do not add “Mac”, “PC”, “Linux”, etc. where generic host applications (like Web browser or PDF reader), or virtual machines (like JAVA or .Net) are targeted. See also the section “Value lists” below for further values.
Fields derived from dc:title:
R
R
R
o
o
Title Concise title, name, or label of institution, resource collection, or individual resource. This field should include the complete title with all the subtitles, if any. This field will be the primary basis on which users will select and recognize resources. If you have no “real” title – as frequently occurring in images – please try to generate one. Often the taxon name(s) will form a good substitute title; or the file name itself may contain title-like information.
R
o
o
o
Logo The resolvable URI of an icon or logo image representing the current resource (especially a service provider or resource collection, but some publications may have logos of their own). In practice, having a logo for a service provide is important if another resource such as an image makes an attribution to a collection or service provider. Entering the Logo URI into a browser should only result in the icon (not in a webpage including the icon). Same as friend-of-a-friend-term: foaf:logo.
Fields derived from dc:description:
o
o
o
o
o
Description Description of collection or individual resource, containing the Who, What, When, Where and Why as free-form text. It optionally allows to present detailed information and will in most cases be shown together with the resource title. If both description and caption (see below) are present, a description is typically displayed instead of the resource.
~
o
Caption As alternative or in addition to description, a caption is free-form text to be displayed together with (rather than instead of) a resource (especially images). Where, as is typical, only one of Description or Caption is present in existing metadata, the most appropriate concept should be chosen, the information should not be duplicated in both fields.
o
o
o
o
Content Modification If media content has been modified or edited significantly in ways that are not immediately obvious or expected to consumers this must be documented and explained. Examples for images are: Blurring the background, removing a distracting twig, moving an object to a different surrounding, or changing the color in parts of the image. Modifications that are standard practice and expected or obvious to users are not necessary to document. Examples of such expected modifications for images are: Changing resolution, cropping, minor sharpening or overall color correction, clearly perceptible modifications (adding arrows or labels, combination or multiple pictures into a table. If it is only known that significant modifications were made, but no details are known, a general statement like “Media may have been manipulated to improve appearance” may be appropriate.
o
o
o
o
Creation Technique Free form text describing the techniques used to prepare the subject prior or while creating the media resource. Examples for such techniques are: Insect under CO2, cooled to 4 °C, preservation with ethanol or formaldehyde, multiflash lighting, remote control, automatic interval exposure.
Fields derived from dc:identifier:
o
R
R / o
R / o
R / o
Resource ID A unique identifier identifier code (number, alphanumeric code, URI, etc.) for a media item or a collection resource. Ideally, this would be a globally unique identifier (a numeric GUID like “D27CDB6E-AE6D-11cf-96B8-444553540000” or a urn like “http:xyz.edu/images/123.jpg” or “urn:lsid:xyz.edu:abc:123”). However, for images or other media, it is sufficient if the identifier is unique with provider and collection (i.e. each of your collection can use 1, 2, 3 as identifier for the first resources).

It is desirable to choose identifiers that are stable over time (remain the same when the metadata are updated). Furthermore, consider that they may simplify communication about resources by helping to identify individual data items in the original or intermediate data repositories. This is especially desirable for resources that are not online and thus not uniquely identified by Best Quality URI.
Please make an effort to provide a stable and persistent identifier, that does not change when you update your data. Without such an identifier, the metadata aggregation is difficult and inefficient.
Some special notes: a) For Type = "Collection" this field is required and must be globally unique. For the data collection on the wiki, where each collection has its own page, you can, however, still leave it empty, your Wiki resource collection will then be the identifier (and collection membership is expressed using the parameter Collection Page which is only available on the wiki, not part of this general agreement). Outside the Wiki, the resource ID of collection must be a stable URI. Collection membership is expressed by setting the field "Collection By Resource ID", see further below, to this value.

b) For normal media resources the field may be left away, if it is too difficult for your to provide it. However,

c) for media resources with metadata in multiple languages (e. g., a title in English and Slovenian), the Resource ID is required again. It then provides the a unique identifier to recombine language specific metadata into a single record.

Resources that exist in multiple quality levels may report multiple URIs:
~
R
R
R
Best Quality URI Best available quality (which may be non-digital or offline). Use this field if only one quality level is available (as it is typical for taxon pages or keys!) This may also be a published CD, etc. For resource search systems to function this must refer to the resource itself, if the resource is online at all (e. g., the URI should point to the image, not to a html-page containing the image - the latter is very desirable to give under Page Context URI, see further below). If your resource is available only after authentication, you may give a direct, unique URI to the resource here. However, use Page Context URI for login or portal pages that control access (a URI is a portal URI if it is the same for several resources).

Note: Availability (online, online with login, off-line published, off-line unpublished, etc. must be given in separate fields; for Best Quality URI under Best Quality Availability, for other quality levels under corresponding names.
~
o
Good Quality URI Quality intended for resources displayed as primary information; e. g., an image between 800 and 1200 px
~
o
Medium Quality URI Intermediate quality, e.g. shortened or using a higher compression causing moderate artifacts.
~
o
Lower Quality URI A smaller/shorter quality that still contains significant information like a 3-5 second birdsong, an image around 150-300 px, etc. Typical for information displayed in a series, e.g. a list of images of states of a character.
~
o
Normal Preview URI Preview, not normally sufficient as an information source in itself, e.g. a short 3 second clip of a bird song, or an image thumbnail perhaps 80-160 px large.
~
o
Tiny Preview URI A yet smaller preview, e.g. for images a thumbnail less than 80 px large.




Certain related resource formats may be given directly: Page Context URI and exchange formats:
~
o
o
o
Page Context URI A URI that, when opened in a web browser, shows the resource in the context of a web application (which may be an html page with or without Javascript, FLEX, Silverlight, etc.). This is very often a desirable item to resource providers. It displays the resource in the design chosen by the provider and together with its metadata, copyright and licensing information. Recommended best practice: if available, metadata search clients should prefer this URL when linking to a resources.
~
o
Exchange 1 URI
Exchange 2 URI
Exchange 3 URI
Whereas rarely applicable to media resources (which typically use interoperable formats), for identification keys or datasets it is highly desirable to specify an exchange/export format. It is possible to give offline-URIs for this. However, giving online exchange formats for online resources is highly appreciated. Each exchange URI should be accompanied by a format value (Exchange 1 Format, Exchange 2 Format, Exchange 3 Format); see below.
Fields derived from dc:relation:
~
R
R
R
R
Service Attribution URI A URI that identifies the primary provider of either the data or metadata, whichever is desired and agreed upon by media resource and metadata provider (Example: "http://www.keytonature.eu/wiki/Julius Kühn Institute - Federal Research Institute for Cultivated Plants"). Client software displaying the results of metadata searches are being requested to display for each resource the following attributions (if available): resource creator, copyright owner, collection context, and the Service Attribution URI. If Service Attribution URI matches a homepage of a data or service provider record, in addition to the URI title, description, or logo are requested to display. Note: For wiki-based data uploading, this field is not used. Instead, in the Metadata template there is a field called "Provider Page". In this case the value of this field should be the name of the wiki page of the provider (Example: "Julius Kühn Institute – Federal Research Institute for Cultivated Plants").
~
o
o
o
o
Secondary Service URIs This optional item allows to provide attribution and reference to secondary aggregation or service nodes which may be of interest in addition to creator, copyright owner and the primary Service Attribution URI. (Multi-valued).
~
R
R
R
R
Collection By Resource ID A collection, the present object is a member of. This will normally be media, but collections may be members of collections. The relation is expressed through the value of a Resource ID of a resource with Type="Collection" object. The values of these fields must match precisely. Images, sounds or taxon pages should belong to a collection. Examples: "http://www.keytonature.eu/wiki/Neotropical Smut Fungi" (if a collection object with such ID exist). For wiki-based metadata submissions, this field is not used. Instead, the Metadata template provides a field "Collection Page"; the value of this should be the name of the wiki page of this collection (Example: "Neotropical Smut Fungi").
Fields derived from dc:rights:
~
o
R
R
R
Copyright Statement Information about rights held in and over the resource. This should be a full-text, readable copyright statement, as required by the national legislation of the copyright holder. On collections, this applies to all contained objects, unless the object itself contains a different statement. Examples: “Copyright XY 2008, all rights reserved”, “© XY Museum 2008”. Do not place just the name of the copyright holder (“XY Museum”) here!
~
R
R
R
R
License Statement The license statement defining how resources may be used. Example: "Available under Creative Commons by-nc-sa 2.5 license". Information on a collection applies to all contained objects unless the object has a different statement. This statement may also inform on the commercial availability of items. Buying an identification tool or media resource is essentially the purchase of an individual license. Examples for such License statements: “Available through bookstores” for a commercially published CD, in License; “Individual licenses available for purchase” for a high-resolution image (note that the medium or low resolution levels of the same image may be available under Creative Commons). Same as dcterms:license.
~
R
R
R
R
License URL A web page corresponding to and elaborating the textual License statement. This may be a standard Creative Commons License URL, or a custom WebStatement.
~
R
R
R
Best Quality Availability Availability of Best Quality URI. This contains values from a constrained vocabulary (examples: "online (free)", "online (login)", "unpublished (digital)", "published (digital)", "published", "unpublished").
~
o
Good Quality Availability
Medium Quality Availability
Lower Quality Availability
Normal Preview Availability
Tiny Preview Availability
Availability of Good Quality Availability to Tiny Preview URI. In contrast to Best Quality Availability, these are optional.
~
o
o
o
Rating Provider-supplied rating for the media object, expressed as 1 (lowest) to 5 (best).
~
R
R
R
R
Copyright Owner The owner of the copyright. (Note: ALA uses dc:publisher for this purpose, but it seems doubtful that the publisher is by necessity the copyright owner, publisher may only hold a license instead.)
Fields derived from dc:creator:
~
o
o
o
o
Creators Creator(s) of resource (for images: the photographer, not the digitizer). Ideally just the name(s), but it may also contain a more elaborate credit text. Avoid using commas: Do not invert names into “Lastname, given name” and use parenthesis for localities: “Williams (NHM)” instead of “Williams, NHM”. (Multi-valued)
~
o
o
o
o
Contributors Person(s) or organization(s) that contributed to the creation of the media resource. The same rules as for Creators apply here (Multi-valued).
~
o
o
o
o
Attribution Statement Free text for "please cite this as …"; this should only be used if the information is not already in Copyright, Creators, etc. If both Credit Line and Creators are present, Credit Line may be shown in preference to Creators if available space does not permit showing both. Same as IPTC "Credit Line".
~
o
o
o
o
Attribution Logo URL The URL of icon or logo image to appear in source attribution. -- Entering this URL into a browser should only result in the icon (not in a webpage including the icon).
~
o
o
o
o
Attribution Link URL The URL where information about ownership, attribution, etc. of the resource may be found. -- This URL may be used in creating a clickable logo. Providers should consider making this link as specific and useful to consumers as possible, e.g. linking to a metadata page of the specific image resource rather than to a generic page describing the owner or provider of a resource.
~
o
R
o
o
Metadata Creator Creator(s) or editor(s) of title, description, keywords, etc. This should not be a person simply typing existing content, or converting digital formats (but such a person may be added if substantial editing changes were necessary). (Multi-valued)
~
o
o
o
o
Metadata Manager Name or contact information for persons or institution involved in the management of metadata; this may included persons responsible for data entry, management, conversions, etc. (Multi-valued).
~
o
o
o
o
Metadata Copyright Owner The person or institution owning the metadata and capable of licensing them (Multi-valued). Related to dcterms:rightsHolder (however, it is unclear whether in some jurisdictions, owner and holder may be different entities).
o
o
o
o
Reviewer Names If the media item has been reviewed with respect to its content and the metadata . This may include editorial review, if the editor is an expert in the technical or subject area. If the reviewer wishes to remain anonymous, the fact of a peer having been performed is expressed by adding “anonymous” as a substitute for a name. (Was: Peer Reviewer Names) (Multi-valued)
o
o
o
o
Reviewer Comments Free-form text comments any comment provided by an expert in the subject featured in the media item that acted as a peer reviewer. (Was: Peer Reviewer Comments)
Fields derived from dc:language:
~
o
R
o
o
Language Language(s) of resource itself. One of "zxx" for language-neutral images/nature sounds; ISO language codes (e. g., "en; it") if the resource is specific to one or several languages; or "und" for resources specific to an unknown/un­defined language. (Multi-valued)
~
R
R
R
R
Metadata Language Language of description and other meta data (but not necessarily of the image itself). The metadata language should be a single language code, not a list! Please try to structure your data accordingly. (not Multi-valued)
Fields derived from dc:date:
~
o
o
o
o
Original Creation Date The date at which the first version of the resource was originally created, e.g. an image captured on film. For photographs this should usually be the capture date (i.e. simple image manipulations should not be considered creation events). However, for works of art, the finishing date should be given here (biologically meaningful observation dates for these may be expressed by referencing a specimen of observation, or expressing a “Derived From” relation to, e. g., a photograph). In the case of originally non-digital media this should be left empty if only a digitization date is known (see below). Use the international (xml) format yyyy-mm-ddThh:mm (e. g., "2007-12-31" or "2007-12-31T14:59"). If possible, timezone information should be added. @@Ranges are under discussion. Note: Although the creation date will often be related to a temporal coverage for many pictures of living organisms, this relation may break in the case of fossils or specimen images, where the period to which the object related may be considered the coverage.
~
o
o
o
o
Accession Date Although the original creation date is most useful, the accession date in an earlier collection may give partial information about the minimum age of a media item. Formatting and notes like Original Creation Date.
~
o
o
o
o
Digitization Date Date the first digital version was created, where different from Original Creation Date. This is often *not* the file creation or modification date, which often only captures the last format change or processing. Use the international date format (see above).
o
o
o
o
o
Modified Point in time when the last change to the data itself occurred, e.g. last version update of a software identification tool. In the cases of media this information often does not need to be supplied since it can be detected from the file date through the internet. Please provide this information whenever this is not possible. Same as dcterms:modified applied to data.
o
o
o
o
o
Metadata Modified Point in time when the last change to metadata occurred. (The last modification of the media content is not recorded here; it is generally assumed to be present in the file information itself.). Same as dcterms:modified applied to metadata.
Fields derived from dc:coverage:
~
o
o
o
o
Audience The intended audience. Examples are: 12 yr old school children, 6th grade, university students, general public, experts, custom officers, farmers. Especially relevant for ID tools.
~
R
R
R
R
World Region World region classification, such as continent, waterbody, island group, or island names, preferred from a controlled vocabulary. Only a single term should be used here, in the language of your metadata. This may be "Global", “Europe”, “Australia”, “Baltic Sea”, etc. Do not use country codes or names, for which specific fields exist.
~
R
R
R
R
Common Geoarea Name (deprecated) The first versions of this agreement had an option to select the best region fitting to resources. The term "World Region" should now be preferred. Old definition: "The single highest geographic area, in the language of your metadata; e. g., country name or name of a national park. This may be "Global", “Europe”, “Germany”, “Oceans”, etc. Do not use country codes, but spelled out names in the metadata language! This should always be present if Locality is present; Locality providing further details. (not Multi-valued)"
~
R
R
R
R
Country Names The geographic location of the specific entity documented by the media item, expressed through the names of countries. Where possible, the standard vocabulary of ISO country codes is preferable. (Multi-valued).
~
R
R
R
R
Country Codes The geographic location of the specific entity documented by the media item, expressed through a constrained vocabulary of countries using 2-letter ISO country code (e. g. "it, si"). Accepted exceptions to be used instead of ISO codes are: "Global", "Marine", "Europe", “N-America”, “C-America”, “S-America”, "Africa", “Asia”, “Oceania”, “Arctic”, “Antarctic”; this list may be extended as necessary. This should always be present if Common Geoarea Name is present (Multi-valued).


~
o
o
o
o
State or Province Optionally, the geographic unit immediately below the country level (individual states where the country is a federation, provinces, or administrative units) in which the subjects (e. g., species, habitats, or events) were recorded by the media (if such information is available in separate fields).
~
o
o
o
o
County or Subprovince Optionally, the counties, subprovinces, or sub-administrative units in which the subjects were recorded by the media (if such information is available in separate fields).
~
o
o
o
o
City or Place Name Optionally, the name of a city or place commonly found in gazetteers (such as a mountain or national park) in which the subject (e.g., species, habitats, or events) was recorded by the media.

named geographic level below the country where such information is available in separate fields

~
o
o
o
o
Locality Actual detailed geolocation of observation (city, location details down to the village, forest, etc.). Do not repeat previous higher geography information (World Region, Country, State or Province, County or Subprovince, City or Place Name) area here.
~
o
o
o
o
Geocoordinates Latitude and longitude of geographic coordinates. Both decimal representation (use "." as decimal point) or degree-minute-second (use " for minutes and ' for seconds) may be used. End the latitude with N or S, or prefix the value with “+” for northward and “−“ for southward. . End the longitude with the letters E or W, or prefix the value with “+” for eastward and “−“ for westward. Use the comma (“,”) to separate latitude from longitude. If positive/negative values are being used instead of prefix letters, it is essential to place the latitude first; otherwise it is recommended. A geodetic datum (such as WGS84 used for GPS measurements) may optionally be added in parentheses at the end. Examples: “27°59′16″N, 86°56′40″E (WGS84)” or “+49.5000°,-123.5000°” (for decimal degrees and using positive/negative values).
~
o
o
o
o
Elevation Elevation (height of ground level above mean sea level) of observation position. For human-held digital cameras (recording GPS-based height) it is permissible to use the position of the camera instead. A geodetic reference datum may be added in parentheses. Elevation is often also called altitude, elevation being the more correct term.
~
o
o
o
o
Depth The depth or range of depth at which the media was recorded. Quantitative expressions including measurement units are preferred.
~
o
o
o
o
Compass Heading The compass heading (direction) that the camera is pointing towards, expressed either in degrees with “0” or “0°” being North, “90” or “90°” being East, or as compass readings such as (“NNW” or “North-North-West”).
Fields derived from dc:subject:
~
R
R
R
R
Subject Category Constrained vocabulary of subjects, aiding with search capabilities. For organisms, this may include major taxonomic groups like vertebrates, fungi, etc., but the vocabulary may also include non-taxonomic terms like “ecosystem”, “forestry”, “aquatic vertebrates”. Provider-specific controlled vocabularies may be used which need to be mapped on integration. (not Multi-valued)


~
o
o
o
o
General Keywords Keywords or "tags". This may contain subject terms for which no more specific field is provided (e. g., “flower diagram”). However, if a metadata provider cannot assign its keywords to the more specific categories provided below, it may also contain those (e. g., scientific taxon, geographic, or organism part names)(Multi-valued).
~
o
o
o
o
Setting The Setting of the content represented in a medium like images, sounds, movies. Constrained vocabulary of : “Natural” = Unmodified object in a Natural setting of unmodified object (e. g., living organisms in their natural environment); “Artificial” = Unmodified object in artificial setting of (e. g., living organisms in artificial environment: Zoo, Garden, Greenhouse, Laboratory, photographic background). “Preserved” = Artificial setting of dead or preserved organisms (e. g., images of specimens in a museum).
~
R
R
R
o
Taxonomic Coverage The lowest taxon integrating all taxa covered by a resource or resource collection (e. g., the name of the family from which several genera are keyed out; “Aves” for a bird key or a bird image collection). Do not add a rank (like “Class” in “Class Aves”). This field is intended to allow coarse navigation of resources, similar to taxonomic terms in Subject Category. However, whereas "Subject Category" is limited to a controlled vocabulary, "Taxonomic Coverage" may be specified at any level of detail desired. Do not repeat a single taxon name here - limit this to "Scientific Names" and leave "Taxonomic Coverage" empty. (not Multi-valued) (Deprecated old name: "Lowest Common Taxon").
~
o
o
o
Scientific Names Scientific names of of organisms, minerals, soils, etc. represented in the media resource (Multi-valued). For common, natural language names see "Common Names" below. If possible, add this information even if the title or caption already contains scientific names. Where the list of scientific names is impractically large (e. g., media collections or identification tools), the number of taxa should be given in Taxon Count (see below). If possible, please do not repeat the LowestCommonTaxon here and do not use abbreviated Genus names (“P. vulgaris”) here.
~
o
o
o
Identified By The name(s) of the person(s) who applied the Scientific Name(s) to the resource.
~
o
o
o
o
Taxon Count Please give an exact or estimated number of specific taxa that are featured in the resource. This is especially desirable if a complete list of taxa is not available or practical. Please try to give this information even where not required. The count should best contain only the taxa covered fully or primarily by the resource. For a taxon page and most images this will be “1”, i. e. other taxa mentioned or in the background should not be counted. However, sometimes a resource may illustrate an ecological or behavioral entity with multiple species, e. g., a host-pathogen interaction. This should be a single integer number. Leave the field empty if you cannot estimate the information (do not enter 0).
~
o
o
o
o
Cultivar or Race Count

Infraspecific Taxon Count
Species Count
Infrageneric Taxon Count
Genus Count
Suprageneric Taxon Count
   Phylum Count
   Class Count
   Order Count
   Family Count

If desired, rank-specific taxon counts may be given in addition to Taxon Count. Suprageneric Taxon Count includes all higher taxonomic ranks above (but not including) the genus, infrageneric taxa are the ranks between genus and species (not including either), and infraspecific ranks include subspecies, variety, forma.

The content of these fields should be single integer numbers. The sum of these detailed field should be equal to TaxonCount.

~
o
o
Scientific Name Synonyms If alternative scientific names other than the accepted are available (according to the current opinion of the metadata provider) it is desirable to provide them here. Synonym may be interpreted in a wide sense, including misspellings or spelling variants and perhaps even mis-identifications. The relation to scientific name is unambiguous only if only a single scientific name is given; only then the information can be harvested for synonymy purposes. For multiple scientific names the semantics of Scientific Name Synonyms is only "other names that may be applicable to the current resource" (Multi-valued).
~
o
o
Common Names Common (= vernacular) names of the subject (Multi-valued). If possible, the ISO language code applicable to the natural language in which the name is given should be added in parentheses behind the name. Example: "abete bianco (it); Tanne (de); White Fir (en)". If names are known to be male- or female-specific , this may be specified as in: “ewe (en-female); ram (en-male);”. (Deprecated synonymous term name: "Vernacular Names").
~
o
o
Associated Specimen ID
Associated Observation ID
IDs of a specimen that may be created from the object presented in the media

Museum or Collection@@@Catalogue number@@@ Free-form text specifying that a resource documents some aspect (habitat, morphology, behavior, organism interaction) of specimens preserved in museums or culture collections (“strains”), or observations recorded in observation databases. Examples: for NHM “BM 23974324” (barcoded) or “BM Smith 32” (non-barcoded specimen); for UNITS: “TSB 28637”; for PMSL: “PMSL-Lepidoptera-2534781”. Where available, URIs (including LSID).

~
o
o
Association Info Free-form text expressing specific aspects (habitat, eye color, mating behavior, organism interaction) that the media documents or expresses for the associated image
o
o
o
o
Subject Part The part being represented in the resource: head, antennae, tarsus, anthers, etc.
o
o
o
o
Subject Sex The sex of the subject: male, female, hermaphrodite, etc.
o
o
o
o
Subject Life Stage The development stage of the subject. This may encompass continuously changing stages (juvenile, adolescent, mature, senescent), seasonally influenced stages (winter stage of deciduous tree, flowering, fruiting, spring or summer morph of certain butterflies) as well as discontinuous stages (egg, larva, imago). (Multi-valued)
o
o
o
o
Subject Orientation Specific orientation (= direction, view angle) of the subject represented in the media resource with respect to the acquisition device. Examples: "dorsal", "ventral", "frontal", etc. No formal encoding scheme as yet exists, values should be in metadata language.
Field derived from dc:source:
~
o
o
o
o
Published Source Optional field to give attribution to a previous digital or printed publication of media (images, documents, identification key). This may refer to multiple publication by the same copyright holder (attribution given under discretion of holder), copying under license (attribution often required), digitization of printed material. Do not put generally "related" publications in here. This field may contain a free-form text description of the description or it may be a URI (“digitally-published://ISBN=961-90008-7-0”) if this resource is also described separately in the data exchange. (Single-valued)
o
o
o
o
Derived From If a resource is derived from another resource, involving significant modification, the resource should refer to the source resource using an appropriate identifier (URI, DOI, etc.), or – if not available – a human-readable reference. This is not intended for quality level changes (e. g., creating thumbs of images). Knowing derivations is of special interest for identification tools (e. g. a key from an unpublished data set, as in FRIDA, or a PDA key from a PC or web key) or web services (e. g. a name synonymization service being derived from a specific data set). It may very rarely also be known where one image or sound recording is derived from another (but compare the separate mechanism to be used for quality/resolution levels).
Field derived from dc:format:
~
o
o
o
o
Format The generic, URI-independent field "Format" is currently maintained for cases where all quality levels use the same format (it is uncertain whether it will be maintained in the future). However, high quality and low quality version may use different formats, e.g. png and jpg, and Page Context URI and exchange URIs (e.g. for identification keys) usually require different formats. See below.
~
o
o
o
Best Quality Format
Good Quality Format
Medium Quality Format
Lower Quality Format
Normal Preview Format
Tiny Preview Format
Page Context Format
Exchange 1 Format
Exchange 2 Format
Exchange 3 Format
Format for the resource at a given quality level. Formats may vary among different quality levels (png-image for high quality, jpg-image for lower). Specifying the format explicitly is necessary only for offline and digital, or if the URI does not include an extension (e. g., "x.de/rsc/123" may be an image, a sound, or a web page). Three types of values are acceptable: (a) any MIME type; (b) commonly understood file extensions like txt, doc, odf, jpg, png, pdf; (c) the following special values: Data-CD, Audio-CD, Video-CD, Data-DVD, Audio-DVD, Video-DVD (d) for exchange formats for identification keys, special values like “DELTA”, “SDD”, “NEXUS”, “Sybase DB”, “Filemaker DB”, “Comma separated values” (Multi-valued).
~
o
o
o
Best Quality Size
Good Quality Size
Medium Quality Size
Lower Quality Size
Normal Preview Size
Tiny Preview Size
Page Context Size
Exchange 1 Size
Exchange 2 Size
Exchange 3 Size
Size of file or stream, in bytes, to which the URI for a given quality level resolves, given as a simple integer value (without kB, MB, etc.). Although easily available after retrieval, making this available in the form of metadata allows consuming applications to decide if they should retrieve a particular resource, and to give a hint as to how long this might take. This information is welcome if readily available, but can otherwise in most cases be automatically inferred by metadata aggregators. Same as ORE.Extent, refines dc:format and dcterms:extent (which may be size or duration). Note: image width/height in pixels and dynamic media (movie/sound) duration in seconds are separate properties.
~
o
o
o
Best Quality Image Width
Good Quality Image Width
Medium Quality Image Width
Lower Quality Image Width
Normal Preview Image Width
Tiny Preview Image Width
Best Quality Image Height
Good Quality Image Height
Medium Quality Image Height
Lower Quality Image Height
Normal Preview Image Height
Tiny Preview Image Height
For images or movies: width or height in pixels; as an integer value. Note: for movies, the duration in seconds is a separate property.
~
o
o
o
Best Quality Duration
Good Quality Duration
Medium Quality Duration
Lower Quality Duration
Normal Preview Duration
Tiny Preview Duration
The duration of the resource when played under standard conditions, in seconds. This must be a simple integer value, without "s", "h", etc. Applicable to dynamic media like sounds or movies.


Specific fields for identification keys:
~
R
ID Tool Structure Fixed values: “Dichotomous” (single-access key with branching limited to two leads), “Polytomous” (single-access key with at least occasionally more than 2 leads), “Multi-access” (the sequence of characters or leads can be freely chosen by the user), “Multi-entry” (in a first step, a free choice of multiple characters is available, followed by a single-access or browsing structure), “Browsing” (descriptions or images arranged in a long sequence like field guides). If an identification tool contains several different keys, this may be Multi-valued.

General notes

  1. Do not enter “empty”, “no”, or “-“ in fields to indicate that they contain no information or are inapplicable.
  2. Items that are on sale should use both an appropriate URI-equivalent (e. g., “digitally-published://”, see “Availability and URI notation”, below) and express the availability for purchase in the License field (e. g. “Individual licenses may be purchased through bookstores”).
  3. Media resources like images or sounds are typically available in different quality levels (e. g., high-resolution, web-optimized, or different thumbnail sizes), under different URIs. We use a “denormalized” structure, i.e. for each quality level fields for URI, Availability, Size, Image Width, Image Height, Duration, Format, License, etc. exist.
    • If your resources exist only in a single quality level (typical for identification keys or taxon pages), use only on the fields Best Quality…, ignoring all fields starting with Good…, Medium…, Lower…, Normal Preview…, or Tiny Preview
  4. Sometimes resources differentiate between “title” and “subtitle”. A subtitle may be a second sentence of the title, or it may be rather a longer, description of the product. To simplify the structure of web interface, the title field should be a complete, human-readable representation of the item. For example, in two resources with “Title = Flora of Erehwon; Subtitle = Gymnosperms” and “Title = Flora of Erehwon; Subtitle = Angiosperms” , the subtitle should be added to the title field to generate a usable title (“Flora of Erehwon. Gymnosperms”). However, occasionally the subtitle contains one or two sentences describing details of the resource and belongs into the Description field.
  5. The copyright and license statements should be complete statements. "Copyright Statement=M. Name" is not a complete copyright statement, "Copyright Statement=© M. Name 2007" is a valid copyright statement. It is very deceptive to leave the “copyright” part of the statements away if the field is already labeled such. However, the web interface can not possibly know the correct way to express copyright in different languages and legal systems so you need to provide a complete statements.
  6. For most exchange formats (e. g., Wiki, Database formats, tab-separated Unicode text), the sequence of metadata fields does not matter and you can rearrange the fields in a sequence different from the one in the table above.

Multilingual Metadata

Some providers have metadata such as title, descriptions, keyword in more than one language. It is highly desirable to provide your native language to the metadata index and not only translations.

The metadata exchange format is intended to allow more than one data row for each resource. Each row must be distinguished by the language in MetadataLanguage, and has information for title, caption etc. in the corresponding language. The rows are kept together by the resource identifier. For collections, this is the CollectionID (required!), whereas for identification tools, taxon pages and media resources this is the BestQualityURI.

An example is given in the following table, showing metadata for a single media resource in two languages. Please do not add new metadata fields like “TitleEn” or “TitleFr”.

Type Title MetadataLanguage BestQualityURI
StillImage Oak infected with powdery mildew en http://x.y.net/images/123
StillImage Mit Mehltau infizierte Eiche de http://x.y.net/images/123

The first resource URI in such cases will be same and serve to keep the multilingual metadata together. Note that occasionally closely related resources may have different URIs in different languages (e. g. if an image contains language-specific text). In such cases the field “Language” (which is different from MetadataLanguage) should also be set to the language of the resource itself.

Note that whenever possible MetadataLanguage should only be a single value. Note that it is not necessary to provide all metadata fields in all languages. Simply create the record in a secondary language only with those fields that are available (but do use the BestQualityURI field in all records, so that the records can be reconnected when converting and integrating the data).

Note: Experience with the first metadata collection in the Key to Nature project showed that it has not become clear enough that a BestQualityURI should always be provided whenever possible, even if the metadata are not multilingual or if the resource is non-digital, unpublished, or both. To support the recombination of multilingual metadata, a unique URI is required.

URIs and Availability (including “pseudo-URIs” for non-internet resources)

Temporary note for users of the first version of this agreement: The system has changed. Please do not encode the availability into pseudo-URL protocols (like "login-http") any long. Please fill the additional field "Best Quality Availability", etc. See Resource Availability for further information.

Value lists (= content standards)

For several data fields we will be using “value” (or “content”) standards.

  • Type: See Resource Type for supported values and comments.
  • Subtype: See Resource Subtype for supported values and comments.
  • Language/MetadataLanguage: See ISO Language Codes.
  • Country Codes: See ISO Country Codes.
  • Subject Category: See Subject Category.
  • MIME Format: In addition to common file-extensions recognized by browsers, any MIME code (see http://www.iana.org/assignments/media-types/) may be used here. (Note: This is necessary only if the format of a digital resource cannot be inferred from its URI. If your URI ends in common file extensions like “.jpg/.jpeg”, “.png”, “.gif”, “.tif/.tiff”, “.mpg/.mpeg”, “.htm/.html”, ”.pdf/.doc/.txt/.odf”, etc. this field may be left empty.)
  • Protected online, offline digital and non-digital availability: See Resource Availability.

Value standards that are mostly specific to identification tool metadata:


Extensibility

Please add any information you consider desirable! You may have further information on resources that you consider valuable and which here may have been either simply forgotten or considered unlikely to exist. Please do provide us with such data; we may well be able to process and integrate it. Please simply use your local field name (or an appropriate translation to English) and prefix it with an “x_“. Some examples for cases that were considered but not included in the field names of the general list:

  • x_GUID: Globally unique identifiers other than the URI used to provide permanent identifiers for resources. Examples: ns.tdwg.org/something/32874, urn:lsid:authority.org:images:298347. Note: Until 2008-09, no provider had such a practice.
  • x_Rating: Ratings of technical quality, content quality, or suitability for a given purpose (e. g., identification, teaching, glossary)?
  • x_HostScientificName, x_PathogenScientificName, etc.

Quality control for Unicode character set

Unicode characters may become corrupted if some application in the data processing sequence does not handle them properly. To help with the quality control of data transfer, each data transfer file should include a record that consist of the fixed value “UnicodeQC” for the Type field and the literal text: “«Unicode Test: ¿ŠšǍǎ – are S and A caron preserved?»” in the Title field (see example below, all other metadata fields may be empty). Please copy the test string exactly, including the guillemets, but not the English quotes using the clipboard or some other means. To control for your internal processing, please try to create this record as early as possible (at least if you suspect that you may have accented characters, e. g., in person names).

THE OLD METHOD FOR A PLAIN TABLE (NON-WIKI) EXCHANGE WAS TO ADD A SPECIAL RECORD CONTAINING A TEST VALUE. THIS NEEDS TO BE REPLACED WITH A NEW METHOD WHERE WIKI-TEMPLATE STYLE DATA ARE ATTACHED TO THE WIKI (This is not necessary, if they are pasted inside the wiki...).

Type Title
UnicodeQC «Unicode Test: ¿ŠšǍǎ – are S and A caron preserved?»
Collection …   (etc.; i. e. all other records like Collection,
StillImage …   StillImage following the UnicodeQC record)

Data exchange using Wiki-template style

The document-based data exchange mechanisms originally employed in the surveys of month 3 and 4 of the Key to Nature project aimed at supporting partners with little IT experience and support. We found, however, that the manual aggregation method was too laborious, error prone, and quality control is difficult. Although these problems are well known and a reason for the widespread adoption of xml data formats and schema-based validation, we believe that a general reliance on xml-schema based data exchange is beyond the technological capabilities of many partners. The Key to Nature Resource Metadata Exchange Agreement EU deliverable D 4.4 gives further details on the problems encountered.

In the new, wiki-based method uploading of data is optionally a manual process, but distributed among all partners. The system for uploading already reports certain quality problems. The central harvesting and integration of such data is then planned to be a fully automated process. Rather than writing custom software for a repository, we plan to use a MediaWiki installation. For the setup and steps to be taken by data providers, see Help:How to add resource metadata on the Wiki.

The key to this approach is the use of templates (infoboxes or table-row-templates). MediaWiki templates provide the following opportunities:

  • they provide a visually attractive reporting of the data, facilitating data proofreading;
  • they may provide error reporting facilities (required fields, incorrect values);
  • they are relatively trivial to parse for metadata harvesting.

We believe the Wiki method will prevent misunderstandings of the necessary relations from resources to collections to providers. Being browser-based it usually prevents a corruption of unicode characters, and it provides immediate feed-back on certain quality problems.

The workplan for implementing this in WP4 (media resources) is:

  • testing the approach with the WP3 (identification tool) data loaded centrally on the wiki
  • writing a general user guide how anybody may follow these examples
  • write harvesting software that converts the Wiki metadata repository into an integrated, searchable format (using Fedora Commons or relational databases).
  • test this with the second round of updating the WP4 data

The first three steps are already in process.

Depending on the technical expertise of the data provider, the process of uploading metadata to the wiki repository may be manual or automatic. A manual conversion from the local data formats can relatively easily be done with spreadsheet and work-processing software, and we will try to write a guide to this. Since the target format is essentially plain text (which is then copied into the browser-based wiki editor using the clipboard), it can be expected that many partners will be able to follow the instructions. However, this process can be fully automated, by writing data export routines and a simple web-script which updates the web pages automatically.

An essential point in this approach is that central error reporting can be automated. In a first step, the harvesting mechanism will simply ignore any data that are not fit for harvesting. The lack of the data in the search facility will already provide a primitive form of feed-back. In a second step, it can automatically add problem reports to the wiki pages it could not process.


Weblinks