Editor: Ka-Ping Yee <pingzestyca>
This document is licensed under the GNU Free Documentation License 1.2.
(Still to be done: examples of PFIF documents, Atom and RSS feeds, and a worked example of importing and merging records into a database. These might not need to be included within this specification.)
This document defines the People Finder Interchange Format, which encompasses both a data model and an XML-based exchange format for sharing data about people who are missing or displaced by natural or human-made disasters. The data model is first described in a manner independent of implementation style (object-oriented, relational, or XML). Then the PFIF proper is specified by an XML Schema. This document also provides a recommended schema for handling PFIF data in a relational database, though such implementation decisions are ultimately up to application developers.
There are two types of records. person records are for static information. note records are for changing information. Each note record belongs to a particular person, and a person record may have with any number of associated note records. Once a record is created, it is never changed. To indicate the fact that data about a particular person has changed, add a timestamped note record associated with that person record.
person records may be created both by those who seek missing a person and by those who have information on a missing person. The person record for a person is the point of convergence for all parties; the note records on that person are the growing pool of shared knowledge.
A person record contains 17 fields. There may be multiple person records for the same person. In fact, any given application that imports data from multiple sources is likely to have multiple person records for the same person. It is up to the application to associate the records (see the Database Implementation section below). It is recommended that applications keep copies of all the records, and separately keep track of which records correspond to the same person.
Meta-information like this is essential because it allows people to trace and ascertain the reliability of the data they are looking at, which was a big problem with survivor databases for September 11.
These fields are specifically for identifying the person and should be for data that never changes. These are the fields to search on. Insisting on all capitals and no accents is ugly, but it makes searches more likely to converge on the correct record. The other field is a very crude way to import foreign data, but the formatting guidelines should make it possible to extract the data again if there is a desperate need to do so. For other, free-form text was chosen instead of XML to make it easy for an application to display other directly in the UI.
icrc.org/birthdate: 1976-02-26A description of the person in free-form text can also go here, with the field name "description". For example:
description:
Dark hair, in her late thirties.
Also goes by the names "Kate" or "Katie".
Each note record belongs to exactly one person record. There may be any number of note records associated with a particular person record. (See below for implementation notes. A database might implement this by including a foreign key, person_record_id, that refers to the person record. An object-oriented representation might implement this by embedding a list of note objects within the person object.)
Not being able to remove or update records was a huge problem with September 11 survivor databases. note records resolve this problem while avoiding the problem of synchronizing conflicting changes. Every note has a timestamp and information on the author of the note. Applications can use the timestamp to determine the most recent value of a given field. Users can use the author information to ascertain the reliabiliy of a given field.
The found, email_of_found_person, phone_of_found_person and last_known_location fields store data that changes over time. When these fields are present in a note record, the record is specifying new values for these fields, and the entry_date field indicates the date that the new values took effect. So, for example, an application that wants to display the most recent known location can look for the note with the latest entry_date that has a non-empty last_known_location field.
The XML Namespace for PFIF is:
The XML Schema for PFIF is located at:
The MIME type for a PFIF document is:
application/pfif+xml
The XML Schema is a straightforward translation of the data model
into two complex types: Person and Note.
A valid PFIF document consists of a single person
element containing zero or more note
elements.
All of the date fields have the XML Schema
datatype dateTime
.
The URL fields have the
datatype anyURI
.
The found field has the
datatype boolean
.
The home_zip field has the
datatype integer
.
All other fields have the
datatype string
.
In a person
element, the fields
person_record_id,
first_name, and
last_name are mandatory.
All other fields are optional.
In a note
element, the fields
note_record_id,
entry_date,
author_name, and
text are mandatory.
All other fields are optional.
PFIF XML documents can be embedded into
Atom 1.0 feeds.
The PFIF document should be embedded using an XML namespace
and inserted as an immediate child
of the entry
element.
Atom 1.0 defines a top-level feed
element
that contains any number of
entry
elements.
The top-level element should declare the PFIF namespace.
The recommended prefix is pfif
,
so the top-level element should look like this:
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:pfif="http://zesty.ca/pfif/1.0"> ... </feed>
The rest of this section offers recommendations on how applications should populate the standard Atom elements so that the feed will make sense to existing feed-reading software. Nonetheless, the embedded PFIF document takes precedence over any redundant information that appears in Atom elements.
Two kinds of PFIF Atom feeds are defined here: person feeds in which each item is a person, and note feeds in which each item is a note. A person feed is roughly analogous to a blog feed containing blog entries; a note feed is roughly analogous to a comment feed on a particular blog entry. For example, one application might subscribe to a person feed in order to aggregate missing person records from other databases; another application might subscribe to a note feed in order to display a stream of notes with updates about a particular person.
An Atom person feed provides at least the following elements
within the feed
element:
id
title
subtitle
updated
link
rel
attribute whose value
is self
.
An Atom person feed provides at least the following elements
within each entry
element:
pfif:person
person
element of the PFIF document.
This element contains elements for the fields of the
person record
and may contain zero or more pfif:note
elements. A service wishing to provide a complete export would
include all the note records associated
with the person here.
id
title
author
name
element containing the value of the
author_name field and an
email
element containing the value of the
author_email field
in the person record.
updated
content
source
title
element of this feed.
(The feed title would become the
source_name field
in the person record
in an application importing this feed.)
This element may also contain copies of any
other child elements of the feed element.
An Atom note feed provides at least the following elements
within the feed
element:
id
title
subtitle
updated
link
rel
attribute whose value
is self
.
An Atom note feed provides at least the following elements
within each entry
element:
pfif:person
person
element of the PFIF document.
In a note feed, this element would contain just the mandatory fields
person_record_id
,
first_name
, and
last_name
,
and a single pfif:note
element
containing the note.
id
title
author
name
element containing the value of the
author_name field and an
email
element containing the value of the
author_email field
in the note record.
updated
content
PFIF XML documents can be embedded into
RSS 2.0 feeds.
(In RSS 2.0 terminology, this section defines an RSS 2.0 module.)
The PFIF document should be specified using an XML namespace
and embedded as an immediate child
of the item
element.
RSS 2.0 defines two main elements,
channel
and item
,
that are enclosed in a top-level rss
element.
The top-level element should declare the PFIF namespace.
The recommended prefix is pfif
,
so the top-level element should look like this:
<rss version="2.0" xmlns:pfif="http://zesty.ca/pfif/1.0"> ... </rss>
The rest of this section offers recommendations on how applications should populate the standard RSS elements so that the feed will make sense to existing feed-reading software. Nonetheless, the embedded PFIF document takes precedence over any redundant information that appears in RSS elements.
As in the preceding section, two kinds of PFIF RSS feeds are defined here: person feeds in which each item is a person, and note feeds in which each item is a note.
An RSS person feed provides at least the following elements
within the channel
element:
title
description
link
An RSS person feed provides at least the following elements
within each item
element:
pfif:person
person
element of the PFIF document.
This element contains elements for the fields of the
person record
and may contain zero or more pfif:note
elements. A service wishing to provide a complete export would
include all the note records associated
with the person here.
guid
title
author
pubDate
description
source
title
child
of the channel
element.
(This would become the source_name field
in the person record
in an application importing this feed.)
link
An RSS note feed provides at least the following elements
within the channel
element:
title
description
link
An RSS note feed provides at least the following elements
within each item
element:
pfif:person
person
element of the PFIF document.
In a note feed, this element would contain just the mandatory fields
person_record_id
,
first_name
, and
last_name
,
and a single pfif:note
element
containing the note.
guid
author
pubDate
description
This section suggests a possible relational database schema for storing PFIF data. The exact details of a database design are up to each application; this is just a possible starting point.
A relational database could store PFIF records in two tables, person and note, for the two types of records. Rows would only be added to these tables; rows would never be modified or removed. To record the fact that data is changed, a timestamped row is added to the note table.
PERSON table: string person_record_id primary key date entry_date string author_name string author_email string author_phone string source_name string source_date string source_url string first_name string last_name string home_city string home_state string home_neighborhood string home_street int home_zip string photo_url text other NOTE table: string note_record_id primary key string person_record_id foreign key not null string linked_person_id foreign key or null date entry_date string author_name string author_email string author_phone bool found string email_of_found_person string phone_of_found_person string last_known_location text text
This suggested schema defines the person table exactly to match person in the PFIF data model, and in the note table adds two fields to the note in the PFIF data model. The first extra field, person_record_id, links each note to a person in the person table. The second extra field, linked_person_id, allows the application to indicate that two person records refer to the same person.
To link a foreign person record with a local person record, the application adds a note associated with the local person record, with a linked_person_id field containing the person_record_id of the foreign record. The other fields of the note would store the date that the merge occurred, information about the person who decided to do the merge (if it was decided by a person), and a description of why the merge was done.
When displaying a person record, the application can then look for all the non-empty linked_person_id fields among the notes that belong to that person record, and display all the linked records or a merged view of the linked records.
It is strongly recommended that, even when an application decides that multiple person records refer to the same person, it should not attempt to merge the records in place. Instead, the application should retain all the received records and just present a merged display of them. Keeping the original records maintains accountability and makes it possible for the application to handle future imports of the same records from their original sources.
Thanks to the CiviCRM team and Kieran Lal for the initial data model on which this specification is based. Thanks to Jonathan Plax for writing the XML Schema document for PFIF.