2012년 1월 7일 토요일

Linked Data

출처 : http://www.ibm.com/developerworks/rational/library/basic-profile-linked-data/index.html

There is interest in using Linked Data technologies for more than one purpose. We have seen interest in it to expose information -- public records, for example -- on the Internet in a machine-readable format. The IBM® Rational® team has been using Linked Data as an architectural model and implementation technology for application integration.

We would like to share information about how we are using these technologies, the best practices and anti-patterns that we have identified, and the specification gaps that we have had to fill. These best practices and anti-patterns can be classified according to (but are not limited to) the following categories:
  • Resources
A summary of the HTTP and RDF standard techniques and best practices that you should use, and anti-patterns you should avoid, when constructing clients and servers that read and write Linked Data 
  • Containers
Defines resources that allow new resources to be created using HTTP POST and existing resources to be found using HTTP GET 
  • Paging
Defines a mechanism for splitting the information in large resources into pages that can be fetched incrementally 
  • Validation
Defines a simple mechanism for describing the properties that a particular type of resource must or may have

The following sections provide details regarding this proposal for a Basic Profile for Linked Data.

Basic Profile Resources

Basic Profile Resources are HTTP Linked Data resources that conform to simple patterns and conventions. Most Basic Profile Resources are domain-specific resources that contain data for an entity in a domain. All Basic Profile Resources follow the rules of Linked Data:
  1. Use URIs as names for things. 
  2. Use HTTP URIs so that people can look up those names. 
  3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL). 
  4. Include links to other URIs so that people can discover more things.
Basic Profile adds a few rules. Some of these rules could be thought of as clarification of the basic Linked Data rules.
  1. Basic Profile Resources are HTTP resources that can be created, modified, deleted and read using standard HTTP methods.
    Basic Profile Resources are created by HTTP POST (or PUT) to an existing resource, deleted by HTTP DELETE, updated by HTTP PUT or PATCH, and "fetched" using HTTP GET. Additionally, Basic Profile Resources can be created, updated, and deleted by using SPARQL Update.
  2. Basic Profile Resources use RDF to define their states.
    The state of a Basic Profile Resource (in the sense of state used in the REST architecture) is defined by a set of RDF triples. Binary resources and text resources are not Basic Profile Resources since their states cannot be easily or fully represented in RDF. XML resources might or might not be suitable as Basic Profile Resources. Some XML resources are really data-oriented resources encoded in XML that can be easily represented in RDF. Other XML documents are essentially marked up text documents that are not easily represented in RDF. Basic Profile Resources can be mixed with other resources in the same application.
  3. You can request an RDF/XML representation of any Basic Profile Resource.The resource might have other representations, as well. These could be other RDF formats, such as Turtle, N3, or NTriples, but non-RDF formats such as HTML and JSON would also be popular additions, and Basic Profile sets no limits.
  4. Basic Profile clients use Optimistic Collision Detection during update.
    Because the update process involves getting a resource first, and then modifying it and later putting it back on the server, there is the possibility of a conflict (for example, another client might have updated the resource since the GET action). To mitigate this problem, Basic Profile implementations should use the HTTP If-Match header and HTTP ETags to detect collisions.
  5. Basic Profile Resources use standard media types.
    Basic Profile does not require and does not encourage the definition of any new media types. A Basic Profile goal is that any standards-based RDF or Linked Data client be able to read and write Basic Profile data, and defining new media types would prevent that in most cases.
  6. Basic Profile Resources use standard vocabularies.
    Basic Profile Resources use common vocabularies (classes, properties, and so forth) for common concepts. Many websites define their own vocabularies for common concepts such as resource type, label, description, creator, last modification time, priority, enumeration of priority values, and so on. This is usually viewed as a good feature by users who want their data to match their local terminology and processes, but it makes it much harder for organizations to subsequently integrate information in a larger view. Basic Profile requires all resources to expose common concepts using a common vocabulary for properties. Sites can choose to additionally expose the same values under their own private property names in the same resources. In general, Basic Profile avoids inventing property names where possible. Instead, it uses ones from popular RDF-based standards, such as the RDF standards themselves, Dublin Core, and so on. Basic Profile invents property URLs where no match is found in popular standard vocabularies.
  7. Basic Profile Resources set rdf:type explicitly.
    A resource's membership in a class extent can be derived implicitly or indicated explicitly by a triple in the resource representation that uses the rdf:type predicate and the URL of the class or derived implicitly. In RDF, there is no requirement to place an rdf:type triple in each resource, but this is a good practice, because it makes a query more useful in cases where inferencing is not supported. Remember also that a single resource can have multiple values for rdf:type. Basic Profile sets no limits to the number of types a resource can have.
  8. Basic Profile Resources use a restricted number of standard data types.
    RDF does not define data types to be used for property values, so Basic Profile lists a set of standard datatypes to be used in Basic Profile.
  9. Basic Profile clients expect to encounter unknown properties and content.
    Basic Profile provides mechanisms for clients to discover lists of expected properties for resources for particular purposes, but it also assumes that any given resource might have many more properties than those listed. Some servers will support only a fixed set of properties for a particular type of resource. Clients should always assume that the set of properties for a resource of a particular type at an arbitrary server might be open, in the sense that different resources of the same type might not all have the same properties, and the set of properties that are used in the state of a resource is not limited to any predefined set. However, when dealing with Basic Profile Resources, clients should assume that a Basic Profile server might discard triples for properties when it has prior knowledge. In other words, servers can restrict themselves to a known set of properties, but clients cannot. When doing an update using HTTP PUT, a Basic Profile client must preserve all property values retrieved by using HTTP GET. This includes all property values that it doesn't change or understand. (Use of HTTP PATCH or SPARQL Update rather than HTTP PUT for updates avoids this burden for clients.)
  10. Basic Profile clients do not assume the type of a resource at the end of a link.
    Many specifications and most traditional applications have a "closed model," by which we mean that any reference from a resource in the specification or application necessarily identifies a resource in the same specification (or a referenced specification) or application. In contrast, the HTML anchor tag can point to any resource addressable by an HTTP URI, not just other HTML resources. Basic Profile works like HTML in this sense. An HTTP URI reference in one Basic Profile Resource can, in general, point to any resource, not just a Basic Profile Resource. There are numerous reasons to maintain an open model like HTML's. One is that it allows data that has not yet been defined to be incorporated in the web in the future. Another reason is that it allows individual applications and sites to evolve over time. If clients assume that they know what will be at the other end of a link, then the data formats of all resources across the transitive closure of all links must be kept stable for version upgrade. A consequence of this independence is that client implementations that traverse HTTP URI links from one resource to another should always code defensively and be prepared for any resource at the end of the link. Defensive coding by client implementers is necessary to allow sets of applications that communicate through Basic Profile to be independently upgraded and flexibly extended.
  11. Basic Profile servers implement simple validations for Create and Update.
    Basic Profile servers should try to make it easy for programmatic clients to create and update resources. If Basic Profile implementations associate a lot of very complex validation rules that need to be satisfied for an update or creation to be accepted, it becomes difficult or impossible for a client to use the protocol without extensive additional information specific to the server that needs to be communicated outside of the Basic Profile specifications. The recommended approach is for servers to allow creation and updates based on the sort of simple validations that can be communicated programmatically through a Shape (see the Constraints section). Additional checks that are required to implement more complex policies and constraints should result in the resource being flagged as requiring more attention, but should not cause the basic Create or Update action to fail.
  12. Basic Profile Resources always use simple RDF predicates to represent links.
    By always representing links as simple predicate values, Basic Profile makes it very simple to know how links will appear in representations and also makes it very simple to query them. When there is a need to express properties on a link, Basic Profile adds an RDF statement with the same subject, object, and predicate as the original link, which is retained, plus any additional "link properties." Basic Profile Resources do not use "inverse links" to support navigation of a relationship in the opposite direction, because this creates a data synchronization problem and complicates a query. Instead, Basic Profile assumes that clients can use queries to navigate relationships in the opposite direction from the direction supported by the
    underlying link.

Common properties

The tables that follow list properties from well-known RDF vocabularies that are recommended for use in Basic Profile Resources. Basic Profile requires none of them, but a specification based on Basic Profile might require one or more of these properties for a particular type of resource.

Commonly used namespace prefixes

PrefixNamespace URI
dctermshttp://purl.org/dc/terms/
rdfhttp://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfshttp://www.w3.org/2000/01/rdf-schema#
bphttp://open-services.net/ns/basicProfile#
xsdhttp://www.w3.org/2001/XMLSchema#


From Dublin Core
URI: http://purl.org/dc/terms/

PropertyRangeComment
dcterms:contributordcterms:AgentThe identifier of a resource (or blank node) that is a contributor of information. This resource can be a person or group of people or, possibly, an automated system.
dcterms:creatordcterms:AgentThe identifier of a resource (or blank node) that is the original creator of the resource. This resource can be a person or group of people or, possibly, an automated system.
dcterms:createdxsd:dateTimeThe creation timestamp.
dcterms:descriptionrdf:XMLLiteralDescriptive text about the resource represented as rich text in XHTML format. Should include only content that is valid and suitable inside an XHTML
element.
dcterms:identifierrdfs:LiteralA unique identifier for the resource. Typically read-only and assigned by the service provider when a resource is created. Not typically intended for end-user display.
dcterms:modifiedxsd:dateTimeDate on which the resource was changed.
dcterms:relationrdfs:ResourceThe URI of a related resource. This is the predicate to use when you do not know what else to use. If you know what kind of relationship it is, use a more specific predicate.
dcterms:subjectrdfs:ResourceShould be a URI (see dbpedia.org). From Dublin Core: "Typically, the subject will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary. To describe the spatial or temporal topic of the resource, use the Coverage element."
dcterms:titlerdf:XMLLiteralA name given to the resource. Represented as rich text in XHTML format. Should include only content that is valid inside an XHTML element.

From RDF
URI: http://www.w3.org/1999/02/22-rdf-syntax-ns#


PropertyRangeComment
rdf:typerdfs:ClassThe type or types of the resource. Basic Profile recommends that the rdf:type(s) of a resource be set explicitly in resource representations to facilitate query with non-inferencing query engines.

From RDF Schema
URI: http://www.w3.org/2000/01/rdf-schema#

PropertyRangeComment
rdfs:memberrdf:ResourceThe URI (or blank node identifier) of a member of a Container.
rdfs:labelrdf:Resource"Provides a human-readable version of a resource name." (From RDFS)

Basic Profile Container
Many HTTP applications and sites have organizing concepts that partition the overall space of resources into smaller Containers. Blog posts are grouped into blogs, wiki pages are grouped into wikis, and products are grouped into catalogs. Each resource created in the application or site is created within an instance of one of these Container-like entities, and users can list the existing artifacts within one. There is no agreement across applications or sites, even within a particular domain, on what these grouping concepts should be called, but they commonly exist and are important. Containers answer two basic questions:
  1. To which URLs can I POST to create new resources?
  2. Where can I GET a list of existing resources?
We call these RDF Containers that you can POST to Basic Profile Containers. Here are some of their characteristics:
  • Clients can retrieve the list of existing resources in a Basic Profile Container.
  • New resources are created in Basic Profile Containers by POSTing to them.
  • Any resource can be POSTed to a Basic Profile Container. A resource does not have to be a Basic Profile Resource with an RDF representation to be POSTed to a Basic Profile Container.
  • After POSTing a new resource to a Container, the new resource will appear as a member of the Container until it is deleted. A Container can also contain resources that were added through other means, for example through the user interface of the site that implements the Container.
  • The same resource can appear in multiple Containers. This happens commonly if one Container is a "view" onto a larger Container.
  • Clients can get partial information about a Basic Profile Container without retrieving a full representation of all of its contents.
The representation of a Basic Profile Container is a standard RDF Container representation that uses the rdfs:member predicate.

Representation of a Basic Profile Container
@prefix rdfs: .

        a rdfs:Container;
        rdfs:member ;
        # … 999999998 more triples here …
        rdfs:member .

The Basic Profile does not recognize or recommend the use of other forms of an RDF Container, such as Bag and Seq, because they are not friendly to query. This follows standard Linked Data guidance for RDF use.

The Basic Profile recommends the use of a set of standard Dublin Core properties with Containers. The subject of triples using these properties is the Container itself.
rdfs:Container domain properties
PropertyOccursRangeComment
dcterms:titlezero or onerdf:XMLLiteralA name given to the resource. Represented as rich text in XHTML format. Should include only content that is valid inside an XHTML  element.
dcterms:descriptionzero or onerdf:XMLLiteralDescriptive text about resource represented as rich text in XHTML format. Should include only content that is valid and suitable inside an XHTML 
 element.
dcterms:publisherzero or onedcterms:AgentAn entity responsible for making the Basic Profile Container and its members available.
bp:containerPredicateexactly onerdfs:PropertyThe predicate of the triples whose objects define the contents of the Container.

Retrieving non-member properties

The representation of a Container that has many members will be large. When we looked at our use cases, we saw that there were several important cases where clients needed to access only the non-member properties of the Container. Because retrieving the whole Container representation to get these information is onerous, we were motivated to define a way to retrieve only the non-member property values. We do this by defining a corresponding resource for each Basic Profile Container, called the "non-member resource," which has a state that is a subset of the state of the Container. The non-member resource's HTTP URI can be derived in the following way:

If the HTTP URI of the Container is {url}, then the HTTP URI of the related non-member resource is {url}?non-member-properties. The representation of {url}?non-member-properties is identical to the representation of {url}, except that the membership triples are missing. The subjects of the triples will still be {url}, not {url}?non-member-properties. Any server that does not support non-member-resources should return an HTTP 404 File Not Found error when a non-member-resource is requested.

This approach is analogous to using HTTP HEAD rather that HTTP GET. The difference is that HTTP HEAD is used to fetch the response headers for a resource, as opposed to requesting the entire representation of a resource using HTTP GET.

HTTP GET example, request
GET /container1?non-member-properties HTTP/1.1
HOST: example.org 
Accept: text/turtle

HTTP GET example, response
@prefix rdfs: .
@prefix dcterms: . 
@prefix bp: .

        a rdfs:Container;
        dcterms:title "An Basic Profile Container of Acme Resources";
        bp:containerPredicate rdfs:member;
        dcterms:publisher .

Basic Profile validation and constraints

Basic Profile resources are RDF resources, and RDF has the happy characteristic that "it can say anything about anything." This means that, in principle, any resource can have any property and there is no requirement that any two resources have the same set of properties, even if they have the same type or types. In practice, though, the properties that are set on resources usually follow regular patterns that are dictated by the uses of those resources. Although a particular resource might have arbitrary properties, when viewed from the perspective of a particular application or use case, the set of properties and property values that are appropriate for that resource in that application will often be predictable and constrained. For example, if a server has resources that represent software products and bugs, for the purposes of displaying information in tabular formats, creating and updating resources, or other purposes, a client might want to know what properties software products and bugs have on that server. The Basic Profile Validation and Constraints specification aims to capture information about those properties and constraints.
The distinction between the resource and the use cases that it participates in is important to us. Traditional technologies such as relational databases constrain the total set of properties that an entity can have. In the Basic Profile, we aim only to define the properties that a resource can have when viewed through the lens of a particular application or use case, yet retaining the ability of the same resource to have an arbitrary set of properties to support other applications and use cases.

The set of properties that a resource can or will have is not necessarily linked to its type, but exploiting the pattern where resources of the same type have the same properties is a very traditional approach that supports the development of many useful applications. Sometimes, knowledge of types and properties for the application is hard-coded in software, but there are many cases where it is desirable to represent this knowledge in data. The Basic Profile provides resource types called Shape and PropertyConstraint to represent this data.
Note on the relationship of Shape to other standards:
Although we're all very familiar from relational databases and object-oriented programming with the model where the valid properties are constrained by the type, it is not the "natural" model of RDF, nor is it the model of the natural world. The familiar model says that if you are of type X, you will have these properties that will have values of certain types. RDF and, to a large degree, the natural world work the other way around; if you have these properties, you must be of type X. We are not aware of any OWL or RDFS construct that lets you say "from the perspective of application X, resources with an RDF type of Y will have the list of properties Z," nor of constraining the types of the values of these properties.

Class: PropertyConstraint
URI: http://open-services.net/ns/basicProfile#PropertyConstraint
bp:PropertyConstraint domain properties

PropertyOccursRangeComment
rdfs:labelzero or onerdfs:LiteralA human-readable name for the subject. (from rdfs)
rdfs:commentzero or onerdfs:LiteralA description of the subject resource. (from rdfs)
bp:constrainedPropertyexactly onerdfs:PropertyThe URI of the predicate being constrained.
bp:rangeShapezero or onebp:ShapeA bp:Shape that describes the rdfs:Class that is range of the property.
bp:allowedValuezero or manyrange of the subjectA value allowed for the property. If there are both bp:allowedValue elements and anbp:AllowedValue resource, then the full set of allowed values is the union of both.
bp:AllowedValueszero or manybp:AllowedValuesA resource with allowed values for the property being defined.
bp:defaultValuezero or onerange of the objectA default value for the property
bp:occursexactly onerdfs:ResourceMust be one of these three:
http://open-service.net/ns/basicProfile#Exactly-one
orhttp://open-service.net/ns/ basicProfile#Zero-or-onehttp://open-service.net/ns/basicProfile#Zero-or-many
or http://open-service.net/ns/ basicProfile#One-or-many
bp:readOnlyzero or oneBooleantrue if the property is read-only. If not set or set to false, then the property is writable. Providers should declare a property read-only when changes to the value of that property will not be accepted on PUT. Consumers should note that the converse does not apply: Providers may reject a change to the value of a writable property.
bp:maxSizezero or oneIntegerFor String properties only, specifies maximum characters allowed. If not set, then there is no maximum or maximum is specified elsewhere.
bp:valueTypezero or onerdfs:ResourceFor literals, see XSD Datatypes.


It is debatable whether we should have a separate bp:PropertyConstraint class with a property on it called bp:constrainedProperty, or whether it would be better to use rdfs:Property and simply define new predicates with rdfs:Property as the domain.

Important:
However, it is important not to use rdfs:range, because the semantics are different.

Class: AllowedValues
URI: http://open-services.net/ns/basicProfile#AllowedValues
bp:AllowedValues domain properties

PropertyOccursRangeComment
bp:allowedValuezero or manysame as range of owning propertyAllowed value


Class: Shape
URI: http://open-services.net/ns/basicProfile#Shape
bp:Shape domain properties

PropertyOccursRangeComment
dcterms:titlezero or onerdfs:XMLLiteralTitle
bp:describedClassexactly onerdfs:ClassClass described
bp:propertyConstraintszero or onerdfs:ListThe list of propertyConstraints for properties of this Shape. The domains of the PropertyConstraints must be compatible with the describedClass.


Validation semantics

Validation semantics are expressed by mapping the property and class definitions in terms of SPARQL ASK semantics. This enables a declarative way in RDF to define the constraints while using the existing SPARQL ASK specification.

Associating Shapes and Containers

It is useful to be able to specify for a Container what types of members it will return and accept, plus what properties it expects to be used with resources of those types. To enable this, the Basic Profile defines two new Container properties.
rdfs:Container domain properties

PropertyOccursRangeComment
bp:createShapezero or manybp:ShapeOne or more Shapes that provide information on the expected data formats of resources that can be POSTed to the Container to create new members.
bp:readShapezero or manybp:ShapeOne or more Shapes that provide information on the expected data formats of resources that can be found as members of the Container.
Containers often add properties of their own to POSTed and PUT resources (creation date, modification date, creator), and it's useful for clients to know what these might be.



댓글 없음:

댓글 쓰기