PFIF Draft Specification
September 3, 2005
Kieran Lal, Jonathan Plax, Ka-Ping Yee

Editor: Ka-Ping Yee <pingzestyca>

This document is licensed under the GNU Free Documentation License 1.2.

(Still to be done: examples of PFIF documents, Atom and RSS feeds, and a worked example of importing and merging records into a database. These might not need to be included within this specification.)

Abstract
Design Principles
Data Model
1. PERSON records
2. NOTE records
XML Format Specification
Atom Feed Specifications
1. Atom Person Feeds
2. Atom Note Feeds
RSS Feed Specifications
1. RSS Person Feeds
2. RSS Note Feeds
Suggested Relational Database Schema
Acknowledgements

1. Abstract

This document defines the People Finder Interchange Format, which encompasses both a data model and an XML-based exchange format for sharing data about people who are missing or displaced by natural or human-made disasters. The data model is first described in a manner independent of implementation style (object-oriented, relational, or XML). Then the PFIF proper is specified by an XML Schema. This document also provides a recommended schema for handling PFIF data in a relational database, though such implementation decisions are ultimately up to application developers.

2. Design Principles

The purpose of PFIF is to bring people and data together. The design aims to promote convergence: convergence of people who seek the same person, convergence of information about a person obtained from various sources, convergence of duplicated data, and ultimately convergence of missing people with their loved ones.
Data is fundamentally divided into two types: data that is fixed and data that changes over time.
Data is not to be thrown away. Incoming updates are timestamped and added to the pool of knowledge; they do not replace or destroy existing data. This approach improves the resilience of the system and allows applications to freely aggregate data from different sources without the complexity of resolving conflicting changes.
Data should be traceable. Since data comes from sources of unknown reliability and accountability, information on the origins of data should be maintained, to help users ascertain its trustworthiness.
Each aggregator of data has its own perspective on the world. This specification avoids assuming that it is possible to dictate truths about the data from a single central authority.
It should be possible to keep track of multiple records that refer to the same person. But, by the preceding principle, each aggregator makes its own decisions about which records to merge; there is no central authority.
It should be possible to resolve multiple copies of the same record that have been imported via different data paths.
All dates must be in UTC, never in a local time zone, because data records will be transmitted among many different time zones. This format uses dates in the RFC 3339 format, with only UTC allowed. Front-ends can convert dates to the local time zone for display.

3. Data Model

There are two types of records. person records are for static information. note records are for changing information. Each note record belongs to a particular person, and a person record may have with any number of associated note records. Once a record is created, it is never changed. To indicate the fact that data about a particular person has changed, add a timestamped note record associated with that person record.

person records may be created both by those who seek missing a person and by those who have information on a missing person. The person record for a person is the point of convergence for all parties; the note records on that person are the growing pool of shared knowledge.

3.1. person records

A person record contains 17 fields. There may be multiple person records for the same person. In fact, any given application that imports data from multiple sources is likely to have multiple person records for the same person. It is up to the application to associate the records (see the Database Implementation section below). It is recommended that applications keep copies of all the records, and separately keep track of which records correspond to the same person.

Static Tracking Information About the Record Itself (8 fields)

Meta-information like this is essential because it allows people to trace and ascertain the reliability of the data they are looking at, which was a big problem with survivor databases for September 11.

person_record_id (string): Unique identifier for the record, which consists of a domain name followed by a slash and a local identifier. Each application is assumed to reside at or belong to a particular domain. When the person_record_id begins with the application's own domain, it means this application is the authority for this record, and the format of the local identifier is up to the application. When the person_record_id begins with some other domain, it means this record is a clone of a record from another source, and the other domain is the authority for this record.
entry_date (string in the form "yyyy-mm-ddThh:mm:ssZ"):: Date that this record was created in the local application, in UTC. The application must guarantee that this value increases monotonically so that a client can update a copy of a database by querying for all records with an entry_date greater than the entry_date of the last received record.
author_name (string):: The full name of the person who entered this record.
author_email (string):: The preferred contact e-mail address of the person who entered this record.
author_phone (string):: The preferred contact phone number of the person who entered this record.
source_name (string):: If this data record is a clone of a record from an external source, this field is present and contains the name of that external source.
source_date (string in the form "yyyy-mm-ddThh:mm:ssZ"):: If this data record is a clone of a record from an external source, this field is present and contains the date in UTC that this record was originally posted to that source location.
source_url (string):: If this data record is a clone of a record from an external source, this field is present and contains the URL to that source record (as specific as possible, down to the URL of the individual record).

Static Identifying Information About a Missing Person (9 fields)

These fields are specifically for identifying the person and should be for data that never changes. These are the fields to search on. Insisting on all capitals and no accents is ugly, but it makes searches more likely to converge on the correct record. The other field is a very crude way to import foreign data, but the formatting guidelines should make it possible to extract the data again if there is a desperate need to do so. For other, free-form text was chosen instead of XML to make it easy for an application to display other directly in the UI.

first_name (string, all capitals, no accents):: First name of the person sought or found, optionally followed by a space and any middle names or middle initials.
last_name (string, all capitals, no accents):: Last name of the person sought or found.
home_city (string, city name, all capitals, no accents):: Home city of the person sought or found.
home_state (string, two-letter postal abbreviation):: Home state of the person sought or found.
home_neighborhood (string, all capitals, no accents):: Name of the home neighborhood of the person sought or found.
home_street (string, all capitals, no accents):: Street name (no number) of the home address of the person sought or found.
home_zip (integer):: Zip code of the home address of the person sought or found.
photo_url (string):: URL to an image of an identifying photograph of the person sought or found.
other (large string):: Free-form text containing any other static data fields brought in from other sources. (Non-static data imported from other sources should go into a note record.) Short fields should be on a single line with the field name, a colon, and the field value. Long fields can be given as a line with the field name and a colon, then text indented on the following lines. Field names for data imported from other applications should begin with the domain name and a slash. So, for example, if a birthdate is imported from an ICRC record it might look like this:
icrc.org/birthdate: 1976-02-26
A description of the person in free-form text can also go here, with the field name "description". For example:
description:
Dark hair, in her late thirties.
Also goes by the names "Kate" or "Katie".

3.2. note records

Each note record belongs to exactly one person record. There may be any number of note records associated with a particular person record. (See below for implementation notes. A database might implement this by including a foreign key, person_record_id, that refers to the person record. An object-oriented representation might implement this by embedding a list of note objects within the person object.)

Not being able to remove or update records was a huge problem with September 11 survivor databases. note records resolve this problem while avoiding the problem of synchronizing conflicting changes. Every note has a timestamp and information on the author of the note. Applications can use the timestamp to determine the most recent value of a given field. Users can use the author information to ascertain the reliabiliy of a given field.

Information About a Missing Person That Changes Over Time (10 fields)

The found, email_of_found_person, phone_of_found_person and last_known_location fields store data that changes over time. When these fields are present in a note record, the record is specifying new values for these fields, and the entry_date field indicates the date that the new values took effect. So, for example, an application that wants to display the most recent known location can look for the note with the latest entry_date that has a non-empty last_known_location field.

note_record_id (string):: Unique identifier for the record, which consists of the application's domain name followed by a slash and a local identifier. For notes entered locally, the format of the local identifier is up to the application. For notes imported from external sources, the application should preserve the value of this field.
entry_date (string in the form "yyyy-mm-ddThh:mm:ssZ"):: Date that this note was added. In most cases, notes should be chronologically sorted for display.
author_name (string):: The full name of the person who entered this note.
author_email (string):: The preferred contact e-mail address of the person who entered this note.
author_phone (string):: The preferred contact phone number of the person who entered this note.
found (boolean string, "true" or "false"):: This value is "true" if the missing person has been personally contacted or seen, or "false" otherwise. The text field of this note MUST describe HOW and WHEN the person was contacted or seen.
email_of_found_person (string):: The preferred contact e-mail address of the FOUND person. This field is present ONLY if the person has been FOUND. The text field of this note MUST describe HOW the person's contact information was determined.
phone_of_found_person (string):: The preferred contact phone number of the FOUND person. This field is present ONLY if the person has been FOUND. The text field of this note MUST describe HOW the person's contact information was determined.
last_known_location (string):: A free-form description of the last known location of the person being sought, including the city, state, and as much detail as possible. The text field of this note MUST describe HOW the person's location was determined.
text (large string):: Free-form text description of the person's current condition, situation and location details, where they were last seen, corrections to other information, etc.

4. XML Format Specification

The XML Namespace for PFIF is:

http://zesty.ca/pfif/1.0

The XML Schema for PFIF is located at:

http://zesty.ca/pfif/pfif-draft.xsd

The MIME type for a PFIF document is:

application/pfif+xml

The XML Schema is a straightforward translation of the data model into two complex types: Person and Note. A valid PFIF document consists of a single person element containing zero or more note elements.

All of the date fields have the XML Schema datatype dateTime. The URL fields have the datatype anyURI. The found field has the datatype boolean. The home_zip field has the datatype integer. All other fields have the datatype string.

In a person element, the fields person_record_id, first_name, and last_name are mandatory. All other fields are optional.

In a note element, the fields note_record_id, entry_date, author_name, and text are mandatory. All other fields are optional.

5. Atom Feed Specifications

PFIF XML documents can be embedded into Atom 1.0 feeds. The PFIF document should be embedded using an XML namespace and inserted as an immediate child of the entry element.

Atom 1.0 defines a top-level feed element that contains any number of entry elements. The top-level element should declare the PFIF namespace. The recommended prefix is pfif, so the top-level element should look like this:

<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:pfif="http://zesty.ca/pfif/1.0">
...
</feed>

The rest of this section offers recommendations on how applications should populate the standard Atom elements so that the feed will make sense to existing feed-reading software. Nonetheless, the embedded PFIF document takes precedence over any redundant information that appears in Atom elements.

Two kinds of PFIF Atom feeds are defined here: person feeds in which each item is a person, and note feeds in which each item is a note. A person feed is roughly analogous to a blog feed containing blog entries; a note feed is roughly analogous to a comment feed on a particular blog entry. For example, one application might subscribe to a person feed in order to aggregate missing person records from other databases; another application might subscribe to a note feed in order to display a stream of notes with updates about a particular person.

5.1. Atom Person Feeds

An Atom person feed provides at least the following elements within the feed element:

id: This element should contain a unique URI associated with this feed. This might be the URL to the website that corresponds to the database or service providing this feed.
title: This element should contain the name of this feed. This should include the title of the database or service providing this feed.
subtitle: This element should contain a phrase or sentence describing this feed. This would be the place to explain how this feed is produced, for example: "Scraped daily by FooMatic 2.3 from http://www.familylinks.icrc.org/".
updated: This element should contain the date and time that this feed was last updated, in UTC, given in "yyyy-mm-ddThh:mm:ssZ" format.
link: This element should contain a URL from which this feed can be retrieved. This element should have a rel attribute whose value is self.

An Atom person feed provides at least the following elements within each entry element:

pfif:person: This element is the top-level person element of the PFIF document. This element contains elements for the fields of the person record and may contain zero or more pfif:note elements. A service wishing to provide a complete export would include all the note records associated with the person here.
id: This element should contain a URI string consisting of the scheme "pfif:" followed by the value of the person_record_id field.
title: This element should contain the value of the first_name field, followed by a space and the value of the last_name field in the person record.
author: This element should contain a name element containing the value of the author_name field and an email element containing the value of the author_email field in the person record.
updated: This element should contain the value of the entry_date field in the person record.
content: This element should contain a human-readable HTML formatting of the information in the person record. It is up to the application to decide how to format the content.
source: This element should contain a copy of the title element of this feed. (The feed title would become the source_name field in the person record in an application importing this feed.) This element may also contain copies of any other child elements of the feed element.

5.2. Atom Note Feeds

An Atom note feed provides at least the following elements within the feed element:

id: This element should contain a unique URI associated with this feed. This might be the URL to the website that corresponds to the database or service providing this feed.
title: This element should contain the name of this feed. This should include the title of the database or service providing this feed, followed by a more specific title that describes how the notes were selected from the database or service. For example, for a note feed about a particular person, the title could be the title of the service followed by the first name and last name of the person in question.
subtitle: This element should contain a phrase or sentence describing this feed. This would be the place to explain how this feed is produced, for example: "Exported by CiviCRM 1.1, http://www.example.org/."
updated: This element should contain the date and time that this feed was last updated, in UTC, given in "yyyy-mm-ddThh:mm:ssZ" format.
link: This element should contain a URL from which this feed can be retrieved. This element should have a rel attribute whose value is self.

An Atom note feed provides at least the following elements within each entry element:

pfif:person: This element is the top-level person element of the PFIF document. In a note feed, this element would contain just the mandatory fields person_record_id, first_name, and last_name, and a single pfif:note element containing the note.
id: This element should contain a URI string consisting of the scheme "pfif:" followed by the value of the note_record_id field.
title: This element should contain an excerpt of the text field.
author: This element should contain a name element containing the value of the author_name field and an email element containing the value of the author_email field in the note record.
updated: This element should contain the value of the entry_date field in the note record.
content: This element should contain an HTML formatting of the text field in the note record. It is up to the application to decide how to format the content.

6. RSS Feed Specification

PFIF XML documents can be embedded into RSS 2.0 feeds. (In RSS 2.0 terminology, this section defines an RSS 2.0 module.) The PFIF document should be specified using an XML namespace and embedded as an immediate child of the item element.

RSS 2.0 defines two main elements, channel and item, that are enclosed in a top-level rss element. The top-level element should declare the PFIF namespace. The recommended prefix is pfif, so the top-level element should look like this:

<rss version="2.0" xmlns:pfif="http://zesty.ca/pfif/1.0">
...
</rss>

The rest of this section offers recommendations on how applications should populate the standard RSS elements so that the feed will make sense to existing feed-reading software. Nonetheless, the embedded PFIF document takes precedence over any redundant information that appears in RSS elements.

As in the preceding section, two kinds of PFIF RSS feeds are defined here: person feeds in which each item is a person, and note feeds in which each item is a note.

6.1. RSS Person Feeds

An RSS person feed provides at least the following elements within the channel element:

title: This element should contain the name of this feed. This should include the title of the database or service providing this feed.
description: This element should contain a phrase or sentence describing this feed. This would be the place to explain how this feed is produced, for example: "Scraped daily by FooMatic 2.3 from http://www.familylinks.icrc.org/".
link: This element should contain a URL to the website that corresponds to the database or service providing this feed.

An RSS person feed provides at least the following elements within each item element:

pfif:person: This element is the top-level person element of the PFIF document. This element contains elements for the fields of the person record and may contain zero or more pfif:note elements. A service wishing to provide a complete export would include all the note records associated with the person here.
guid: This element should contain the value of the person_record_id field.
title: This element should contain the value of the first_name field, followed by a space and the value of the last_name field in the person record.
author: This element should contain the value of the author_email field, followed by a space and the value of the author_name field enclosed in parentheses.
pubDate: This element should contain the date in the entry_date field in the person record, converted to RFC 822 date format, for example: "Sat, 07 Sep 2002 00:00:01 GMT". The timezone MUST be GMT and the year MUST have four digits.
description: This element should contain a human-readable HTML formatting of the information in the person record. It is up to the application to decide how to format the description.
source: This element should contain the name of the feed, the same as the title child of the channel element. (This would become the source_name field in the person record in an application importing this feed.)
link: This element should contain a permanent URL to a web page for the person record, if this application provides one. (This is not the same as the source_url field in the person record in the application producing the feed. This would become the source_url field in the person record in an application importing this feed.)

6.2. RSS Note Feeds

An RSS note feed provides at least the following elements within the channel element:

title: This element should contain the name of this feed. This should include the title of the database or service providing this feed, followed by a more specific title that describes how the notes were selected from the database or service. For example, for a note feed about a particular person, the title could be the title of the service followed by the first name and last name of the person in question.
description: This element should contain a phrase or sentence describing the feed. This would be the place to explain how the feed is produced, for example: "Scraped daily by FooMatic 2.3 from http://www.familylinks.icrc.org/".
link: This element should contain a URL to the website that corresponds to the database or service providing this feed. For a note feed about a particular person, this link could point to the web page for that person's record.

An RSS note feed provides at least the following elements within each item element:

pfif:person: This element is the top-level person element of the PFIF document. In a note feed, this element would contain just the mandatory fields person_record_id, first_name, and last_name, and a single pfif:note element containing the note.
guid: This element should contain the value of the note_record_id field.
author: This element should contain the value of the author_email field, followed by a space and the value of author_name field enclosed in parentheses.
pubDate: This element should contain the date in the entry_date field in the note record, converted to RFC 822 date format, for example: "Sat, 07 Sep 2002 00:00:01 GMT". The timezone MUST be GMT and the year MUST have four digits.
description: This element should contain an HTML formatting of the text field in the note record. It is up to the application to decide how to format the description.

7. Suggested Relational Database Schema

This section suggests a possible relational database schema for storing PFIF data. The exact details of a database design are up to each application; this is just a possible starting point.

A relational database could store PFIF records in two tables, person and note, for the two types of records. Rows would only be added to these tables; rows would never be modified or removed. To record the fact that data is changed, a timestamped row is added to the note table.

PERSON table:
     string    person_record_id       primary key
     date      entry_date
     string    author_name
     string    author_email
     string    author_phone
     string    source_name
     string    source_date
     string    source_url
     string    first_name
     string    last_name
     string    home_city
     string    home_state
     string    home_neighborhood
     string    home_street
     int       home_zip
     string    photo_url
     text      other

NOTE table:
     string    note_record_id         primary key
     string    person_record_id       foreign key not null
     string    linked_person_id       foreign key or null
     date      entry_date
     string    author_name
     string    author_email
     string    author_phone
     bool      found
     string    email_of_found_person
     string    phone_of_found_person
     string    last_known_location
     text      text

This suggested schema defines the person table exactly to match person in the PFIF data model, and in the note table adds two fields to the note in the PFIF data model. The first extra field, person_record_id, links each note to a person in the person table. The second extra field, linked_person_id, allows the application to indicate that two person records refer to the same person.

To link a foreign person record with a local person record, the application adds a note associated with the local person record, with a linked_person_id field containing the person_record_id of the foreign record. The other fields of the note would store the date that the merge occurred, information about the person who decided to do the merge (if it was decided by a person), and a description of why the merge was done.

When displaying a person record, the application can then look for all the non-empty linked_person_id fields among the notes that belong to that person record, and display all the linked records or a merged view of the linked records.

It is strongly recommended that, even when an application decides that multiple person records refer to the same person, it should not attempt to merge the records in place. Instead, the application should retain all the received records and just present a merged display of them. Keeping the original records maintains accountability and makes it possible for the application to handle future imports of the same records from their original sources.

8. Acknowledgements

Thanks to the CiviCRM team and Kieran Lal for the initial data model on which this specification is based. Thanks to Jonathan Plax for writing the XML Schema document for PFIF.

PFIF Draft Specification September 3, 2005Kieran Lal, Jonathan Plax, Ka-Ping Yee