PFIF 1.3 Specification

March 7, 2011
Editor: Ka-Ping Yee (ping@zesty.ca)

URL of this specification: http://zesty.ca/pfif/1.3
FAQ, examples, and other information on PFIF: http://zesty.ca/pfif

This document is licensed under the GNU Free Documentation License 1.2.

Abstract
Design principles
Data life cycle
Data model
1. PERSON records
2. NOTE records
XML format specification
Atom feed specifications
1. Atom person feeds
2. Atom note feeds
RSS feed specifications
1. RSS person feeds
2. RSS note feeds
Suggested relational database schema
Changes from previous versions
Acknowledgements

1. Abstract

This document defines the People Finder Interchange Format, which consists of a data model and an XML-based exchange format for sharing data about people who are missing or displaced by natural or human-made disasters. The data model is first described in a manner independent of implementation style (object-oriented, relational, or XML), then the PFIF XML format is specified by an RELAX NG schema. This document also offers an example of a possible relational database schema for PFIF data.

2. Design principles

The purpose of PFIF is to bring people and data together. The design aims to promote convergence: convergence of people who seek the same person, convergence of information about a person obtained from various sources, convergence of duplicated data, and ultimately convergence of missing people with their loved ones.
Data should be traceable. Since data comes from sources of unknown reliability and accountability, information on the origins of data should be maintained, to help users ascertain its trustworthiness.
Each record belongs to an original repository, which is the (PFIF or non-PFIF) repository where the record was first entered. The record may be copied to other places, but the original repository remains the authority on the record. Only the original repository should ever change the contents of a record.
Each aggregator of data has its own perspective on the world and is responsible for choosing which data sources to trust. It is not possible to dictate truths about all data from a single central authority.
Because multiple records might refer to the same person, PFIF allows such records to be associated with each other. But, by the preceding principle, each aggregator makes its own decisions about which records to associate; there is no central authority.
It should be possible to resolve multiple copies of the same record that have been imported via different data paths.
All dates and times must be in UTC, never in a local time zone, because data records will be transmitted among many different time zones. This format uses dates in the RFC 3339 format, with only UTC allowed. Front-ends can convert dates and times to the local time zone for display.

3. Data life cycle

Each PFIF repository may contain original records and clone records. An original record is a record residing in its original repository; a clone record is a copy of a record that originated in another repository. The following diagram describes the life of a PFIF record as it is created and then travels to other repositories.

                     .----------------------.
                     | 1. real-world facts  |
                     '----------------------'
                          |             |
       entered by a human |             | entered by a human
   into a PFIF repository |             | into a non-PFIF repository
                          |             |
 entry_date, source_date, |             |
  source_name, source_url |             |
are set by the repository |             |
                          v             v
.------------------------------.   .---------------------------------.
| 2a. original PFIF record in  |   | 2b. original non-PFIF record in |
| record's original repository |   | record's original repository    |
'------------------------------'   '---------------------------------'
                          |             |
       exported as a PFIF |             | parsed and converted to the PFIF
         document or feed |             | data model by a human or program
                          |             |
                          |             | source_date, source_name, source_url
                          |             | are set by the human or program
                          v             v
                        .-----------------.
       .--------------> | 3. PFIF record  |
       |                '-----------------'
       |                        |
       |                        | loaded into a PFIF repository
       |                        |
       |                        | entry_date is set to date/time of import
       |                        v
       |     .--------------------------------------.
       |     | 4. clone record in a PFIF repository |
       |     '--------------------------------------'
       |                        |
       |                        | exported as a PFIF document or feed
       |                        |
       '------------------------'

3.1. Incremental export mechanism

Whenever a PFIF repository adds a new original record or clone record, it must set the entry_date field to the current time. This time value must never decrease as records are added. A client can incrementally update its copy of a repository by querying for all records with an entry_date greater than or equal to the entry_date of the last received record.

3.2. Data update mechanism

The original repository for a record (2a or 2b in the diagram above) can update any of the fields on a record after it is created, except the person_record_id field. Whenever a PFIF repository creates or updates an original record, it must set both the source_date and entry_date fields to the current time. When a repository imports a PFIF record that has the same record identifier as an existing record, it should keep the version with the latest source_date.

3.3. Data expiry mechanism

If present, the expiry_date field indicates when a record should be deleted to preserve the privacy of the personal information it contains. Conforming PFIF implementations must meet the following requirements:

Within one day after expiry_date, a PFIF repository must make the contents of the person record and any associated note records inaccessible to all external clients, including users and machine API clients.
Thereafter, if the repository exports its data through an API, it should continue to export a placeholder record in the place of the expired person record. This placeholder should keep the same person_record_id and expiry_date values, and have both source_date and entry_date set to the time that the placeholder was created. All other fields should be empty or omitted.
Within 60 days after expiry_date, a PFIF repository must permanently and unrecoverably delete all its copies (including backups) of the contents of the person record and any associated note records, except for the person_record_id, source_date, entry_date, and expiry_date fields needed to produce the placeholder.

To satisfy a user request to delete an existing original record, a PFIF repository should set the record's expiry_date to the current time. (In accordance with the preceding section, it would also set the source_date and entry_date to the current time.) The expiry mechanism described above would then cause the deletion to propagate to other conforming PFIF repositories.

4. Data Model

There are two types of records. person records are for information that identifies a person. note records are for information about the current status of a person. Each note record belongs to a particular person, and a person record may have with any number of associated note records.

person records may be created both by those who seek missing a person and by those who have information on a missing person. The person record for a person is the point of convergence for all parties; the note records on that person are the growing pool of shared knowledge.

A person record should only be updated if the information in the record is incorrect. If the status or location of a particular person has changed, this should be indicated by adding a new note record associated with that person record.

4.1. person records

A person record contains 23 fields. There may be multiple person records for the same person. In fact, any given application that imports data from multiple sources is likely to acquire multiple person records for the same person. It is up to the application to associate such records (see Suggested relational database schema below). It is recommended that applications keep copies of all the records, and separately keep track of which records correspond to the same person.

Metadata about the record itself (9 fields)

This metadata is necessary to enable users of the data to trace and ascertain its reliability.

person_record_id (ASCII string): Unique identifier for this record, which consists of a lowercase ASCII domain name followed by a slash and a local identifier. The domain name identifies this record's original repository, which is the authority for this record. The format of the local identifier is up to the original repository. When the person_record_id begins with a domain other than the application's own domain, it means this record is a clone record.
entry_date (ASCII string in the form "yyyy-mm-ddThh:mm:ssZ"):: Date in UTC that this copy of this record was stored (see Incremental export mechanism above).
expiry_date (ASCII string in the form "yyyy-mm-ddThh:mm:ssZ"):: Date in UTC after which this record should be deleted (see Data expiry mechanism above).
author_name (Unicode string):: The full name of the person who entered this record.
author_email (ASCII string):: The preferred contact e-mail address of the person who entered this record.
author_phone (ASCII string):: The preferred contact phone number of the person who entered this record.
source_name (Unicode string):: The human-readable name of the original repository of this record.
source_date (ASCII string in the form "yyyy-mm-ddThh:mm:ssZ"):: The date in UTC that the original copy of this record was created in its original repository.
source_url (ASCII string):: The URL to this record in its original repository (as specific as possible, down to the URL of the individual record).

Identifying information about a missing person (14 fields)

These fields contain information that is used to identify a person; this is information that is not expected to change unless it is incorrect. Searches for person records should search over these fields.

The other field is a very crude way to import foreign data; the formatting guidelines are intended to enable extraction of the foreign data if necessary. For other, free-form text was chosen instead of XML to make it easy for an application to display the other field directly in the UI.

full_name (Unicode string):

All names of the person sought or found, combined in the order and fashion customary to the person, language, and culture. For example, a typical English name would be formatted as a first, middle, and last name with spaces between them, whereas a typical Chinese name would be formatted with the family name first and no spaces between characters. Use newline characters (Unicode U+000A) to separate disconnected names (for example, for a person with both English and Chinese names), and place the more commonly used names first.

first_name (Unicode string):

Given name of the person sought or found, optionally followed by a space and any middle names or middle initials.

last_name (Unicode string):

Family name of the person sought or found.

sex (ASCII string):

Physical sex of the person sought or found, specified as one of the three strings female, male, or other. If the sex is unknown, omit this field.

date_of_birth (ASCII string in the form "yyyy", "yyyy-mm", or "yyyy-mm-dd"):

Exact or approximate date of birth of the person sought or found.

age (integer, or ASCII string in the form "min-max"):

Approximate age of the person sought or found, in years since birth as of the source_date of this record. The value of this field is either a single decimal integer or an inclusive range given as two decimal integers separated by a hyphen. This field has no defined meaning when source_date is missing.

home_street (Unicode string):

Street name of the home address of the person sought or found. To protect user privacy, applications should generally not require users to enter a street number in this field.

home_city (Unicode string):

Home city of the person sought or found.

home_neighborhood (Unicode string):

Name of the home neighborhood of the person sought or found. Use this field for the names of official or unofficial geographic regions not captured by the other address fields.

home_state (Unicode string):

Home state, province, territory, district, region, parish, county, or department of the person sought or found, specified as an uppercase ISO 3166-2 code.

home_postal_code (ASCII string):

Postal code of the home address of the person sought or found, in the format most commonly used in the country. Upgraded PFIF 1.1 repositories should export their existing home_zip values in this field.

home_country (ASCII string):

Home country of the person sought or found, specified as an uppercase two-letter ISO-3166-1 country code. Upgraded PFIF 1.1 repositories should set this field to US when exporting records whose home_state refers to a U. S. state or home_zip field contains a U. S. zip code.

photo_url (ASCII string):

URL to an image of an identifying photograph of the person sought or found.

other (large Unicode string):

Free-form text containing any other identifying data fields brought in from other sources. (Status data imported from other sources should go into a note record.) Short fields should be on a single line with the field name, a colon, and the field value. Long fields can be given as a line with the field name and a colon, then text indented on the following lines. When a record is converted from some other form to PFIF by a machine process, the field "automated-pfif-author" should be present and should name the program that produced the PFIF. The "automated-pfif-author" field is not added when records are exported from a PFIF repository. A description of the person in free-form text can also go here, with the field name "description". For example, a program that scrapes a record from a non-PFIF format that includes a free-form text field might produce an other field like this:

description:
    Dark hair, in her late thirties.
    Also goes by the names "Kate" or "Katie".
automated-pfif-author: ScrapeMatic 0.5

Field names for data fields imported from other applications should begin with a domain name and a slash, where the domain name identifies the entity that defined the field. For example, if example-format.org defines a missing persons format that contains a "birth_city" field, it would be imported into the PFIF other field like this:

example-format.org/birth_city: London, UK

4.2. note records

Each note record belongs to exactly one person record. There may be any number of note records associated with a particular person record. (See below for implementation notes. A database might implement this by including a foreign key, person_record_id, that refers to the person record. An object-oriented representation might implement this by embedding a list of note objects within the person object.)

note records are used to provide updated, current information on a missing person. Every note has a timestamp and information on the author of the note. Applications can use the timestamp to determine the most recent value of a given field. Users can use the author information to ascertain the reliabiliy of a given field.

Metadata about the record itself (8 fields)

note_record_id (ASCII string):: Unique identifier for this record, which consists of a domain name followed by a slash and a local identifier. The domain name identifies this record's home repository, which is the authority for this record. The format of the local identifier is up to the home repository. When the note_record_id begins with a domain other than the application's own domain, it means this record is a clone of a record from another source.
person_record_id (ASCII string):: The person_record_id of the person record to which this note belongs.
linked_person_record_id (ASCII string):: The person_record_id of another person record to associate with the record to which this note belongs. When this field is present, it signifies that the author of this note believes that the two records identified by person_record_id and linked_person_record_id refer to the same person. If this field is present, the text field should explain how these records were determined to refer to the same person.
entry_date (ASCII string in the form "yyyy-mm-ddThh:mm:ssZ"):: Date in UTC that this copy of this record was stored. A PFIF repository must guarantee that this value increases monotonically as records are added, so that a client can update a copy of a repository by querying for all records with an entry_date greater than or equal to the entry_date of the last received record.
author_name (Unicode string):: The full name of the person who entered this note.
author_email (ASCII string):: The preferred contact e-mail address of the person who entered this note.
author_phone (ASCII string):: The preferred contact phone number of the person who entered this note.
source_date (string in the form "yyyy-mm-ddThh:mm:ssZ"):: The date in UTC that the original copy of this note was created in its home repository. In most cases, notes should be sorted by this field for display.

Status information about a missing person (6 fields)

The found, status, email_of_found_person, phone_of_found_person and last_known_location fields store data that changes over time. When these fields are present in a note record, the record is specifying new values for these fields, and the source_date field indicates the date that the new values took effect. So, for example, an application that wants to display the most recent known location can look for the note with the latest source_date that has a non-empty last_known_location field.

found (ASCII string):

This value is the string true if the missing person has been personally contacted or seen by the author of this note, or false otherwise. If this field is true, the text field of this note should describe HOW and WHEN the person was contacted or seen.

status

Status of the person sought or found, specified as one of the following five strings:

information_sought: The author of the note is seeking information on the person in question.
is_note_author: The author of the note is the person in question.
believed_alive: The author of the note has received information that the person in question is alive.
believed_missing: The author of the note has reason to believe that the person in question is still missing.
believed_dead: The author of the note has received information that the person in question is dead.

email_of_found_person (ASCII string):

The current preferred contact e-mail address of the FOUND person. This field is present ONLY if the person has been FOUND. If this field is present, the text field of this note should describe HOW the person's contact information was determined.

phone_of_found_person (ASCII string):

The current preferred contact phone number of the FOUND person. This field is present ONLY if the person has been FOUND. If this field is present, the text field of this note should describe HOW the person's contact information was determined.

last_known_location (Unicode string):

A free-form description of the last known location of the person being sought, including the city, state, and as much detail as possible. If this field is present, the text field of this note should describe HOW the person's location was determined.

text (large string):

Free-form text description of the person's current condition, situation and location details, where they were last seen, corrections to other information, and so on.

5. XML format specification

The XML Namespace for PFIF is:

http://zesty.ca/pfif/1.3

The MIME type for a PFIF document is:

application/pfif+xml

A valid PFIF XML document consists of a single pfif element containing one or more person or note elements, each of which contains child elements for the fields described above. In a person element, the person_record_id, source_date, and full_name fields are mandatory. In a note element, the note_record_id, author_name, source_date, and text fields are mandatory. All other fields are optional. The order of the child elements within a person or note element is not significant.

A note element can exist inside or outside a person element. When a note element appears outside a person element, the note must contain a person_record_id. Otherwise, the person_record_id field is optional, and if present, must match the person_record_id of the enclosing person.

The RELAX NG Schema for PFIF, given in RELAX NG Compact Syntax, is as follows:

namespace pfif = "http://zesty.ca/pfif/1.3"

start = element pfif:pfif { person* & note* }

person = element pfif:person {
    element pfif:person_record_id { record_id } &
    element pfif:entry_date { time } ? &
    element pfif:expiry_date { time } ? &
    element pfif:author_name { text } ? &
    element pfif:author_email { email } ? &
    element pfif:author_phone { phone } ? &
    element pfif:source_name { text } ? &
    element pfif:source_date { time } &
    element pfif:source_url { url } ? &
    element pfif:full_name { text } &
    element pfif:first_name { text } ? &
    element pfif:last_name { text } ? &
    element pfif:sex { sex } ? &
    element pfif:date_of_birth { approx_date } ? &
    element pfif:age { approx_age } ? &
    element pfif:home_street { text } ? &
    element pfif:home_neighborhood { text } ? &
    element pfif:home_city { text } ? &
    element pfif:home_state { text } ? &
    element pfif:home_postal_code { text } ? &
    element pfif:home_country { country_code } ? &
    element pfif:photo_url { url } ? &
    element pfif:other { text } ? &
    note*
}

note = element pfif:note {
    element pfif:note_record_id { record_id } &
    element pfif:person_record_id { record_id } ? &
    element pfif:linked_person_record_id { record_id } ? &
    element pfif:entry_date { time } ? &
    element pfif:author_name { text } &
    element pfif:author_email { email } ? &
    element pfif:author_phone { phone } ? &
    element pfif:source_date { time } &
    element pfif:found { boolean } ? &
    element pfif:status { status } ? &
    element pfif:email_of_found_person { email } ? &
    element pfif:phone_of_found_person { phone } ? &
    element pfif:last_known_location { text } ? &
    element pfif:text { text }
}

record_id = xsd:string { pattern = ".+/.+" }
time = xsd:dateTime { pattern = "\d\d\d\d-\d\d-\d\dT\d\d:\d\d:\d\d(\.\d+)?Z" }
email = xsd:string { pattern = ".+@.+" }
phone = xsd:string { pattern = "[\-+()\d ]+" }
url = text
sex = "female" | "male" | "other"
approx_date = xsd:string { pattern = "\d\d\d\d(-\d\d(-\d\d)?)?" }
approx_age = xsd:string { pattern = "\d+(-\d+)?" }
country_code = xsd:string { pattern = "[A-Z][A-Z]" }
boolean = "true" | "false"
status = "information_sought" | "is_note_author" |
    "believed_alive" | "believed_missing" | "believed_dead"

6. Atom feed specifications

PFIF XML documents can be embedded into Atom 1.0 feeds. The PFIF document should be embedded using an XML namespace and inserted as an immediate child of the entry element.

Atom 1.0 defines a top-level feed element that contains any number of entry elements. The top-level element should declare the PFIF namespace. The recommended prefix is pfif, so the top-level element should look like this:

<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:pfif="http://zesty.ca/pfif/1.3">
...
</feed>

The rest of this section offers recommendations on how applications should populate the standard Atom elements so that the feed will make sense to existing feed-reading software. Nonetheless, the embedded PFIF document takes precedence over any redundant information that appears in Atom elements.

Two kinds of PFIF Atom feeds are defined here: person feeds in which each entry contains a person, and note feeds in which each entry contains a note. A person feed is roughly analogous to a blog feed containing blog entries; a note feed is roughly analogous to a comment feed on a particular blog entry. For example, one application might subscribe to a person feed in order to aggregate missing person records from other databases; another application might subscribe to a note feed in order to display a stream of notes with updates about a particular person.

6.1. Atom person feeds

An Atom person feed provides at least the following elements within the feed element:

id: This element should contain a unique URI associated with this feed. This might be the URL to the website that corresponds to the database or service providing this feed.
title: This element should contain the name of this feed. This should include the title of the database or service providing this feed.
subtitle: This element should contain a phrase or sentence describing this feed. This would be the place to explain how this feed is produced, for example: "Scraped daily by FooMatic 2.3 from http://example.org/".
updated: This element should contain the date and time in UTC that this feed was last updated, given in "yyyy-mm-ddThh:mm:ssZ" format.
link: This element should contain a URL from which this feed can be retrieved. This element should have a rel attribute whose value is self.

An Atom person feed provides at least the following elements within each entry element:

pfif:person: This element contains child elements for the fields of the person record, as well as zero or more pfif:note elements. A service wishing to provide a complete export would include all the note records associated with the person here.
id: This element should contain a URI string consisting of the scheme "pfif:" followed by the value of the person_record_id field.
title: This element should contain the value of the first_name field, followed by a space and the value of the last_name field in the person record.
author: This element should contain a name element containing the value of the author_name field and an email element containing the value of the author_email field in the person record.
updated: This element should contain the value of the source_date field in the person record.
content: This element should contain a human-readable HTML formatting of the information in the person record. It is up to the application to decide how to format the content.
source: This element should contain a copy of the title element of this feed. This element may also contain copies of any other child elements of the feed element.

6.2. Atom note feeds

An Atom note feed provides at least the following elements within the feed element:

id: This element should contain a unique URI associated with this feed. This might be the URL to the website that corresponds to the database or service providing this feed.
title: This element should contain the name of this feed. This should include the title of the database or service providing this feed, followed by a more specific title that describes how the notes were selected from the database or service. For example, for a note feed about a particular person, the title could be the title of the service followed by the first name and last name of the person in question.
subtitle: This element should contain a phrase or sentence describing this feed. This would be the place to explain how this feed is produced, for example: "Exported by CiviCRM 1.1, http://www.example.org/."
updated: This element should contain the date and time in UTC that this feed was last updated, given in "yyyy-mm-ddThh:mm:ssZ" format.
link: This element should contain a URL from which this feed can be retrieved. This element should have a rel attribute whose value is self.

An Atom note feed provides at least the following elements within each entry element:

pfif:note: This element contains child elements for the fields of the note record.
id: This element should contain a URI string consisting of the scheme "pfif:" followed by the value of the note_record_id field.
title: This element should contain an excerpt of the text field.
author: This element should contain a name element containing the value of the author_name field and an email element containing the value of the author_email field in the note record.
updated: This element should contain the value of the source_date field in the note record.
content: This element should contain an HTML formatting of the text field in the note record. It is up to the application to decide how to format the content.

7. RSS feed specifications

PFIF XML documents can be embedded into RSS 2.0 feeds. (In RSS 2.0 terminology, this section defines an RSS 2.0 module.) The PFIF document should be specified using an XML namespace and embedded as an immediate child of the item element.

RSS 2.0 defines two main elements, channel and item, that are enclosed in a top-level rss element. The top-level element should declare the PFIF namespace. The recommended prefix is pfif, so the top-level element should look like this:

<rss version="2.0" xmlns:pfif="http://zesty.ca/pfif/1.3">
...
</rss>

The rest of this section offers recommendations on how applications should populate the standard RSS elements so that the feed will make sense to existing feed-reading software. Nonetheless, the embedded PFIF document takes precedence over any redundant information that appears in RSS elements.

As in the preceding section, two kinds of PFIF RSS feeds are defined here: person feeds in which each item contains a person, and note feeds in which each item contains a note.

7.1. RSS person feeds

An RSS person feed provides at least the following elements within the channel element:

title: This element should contain the name of this feed, which should include the title of the database or service providing this feed.
description: This element should contain a phrase or sentence describing this feed. This is the place to explain how this feed is produced, for example: "Scraped daily by FooMatic 2.3 from http://example.org/".
lastBuildDate: This element should contain the date and time in UTC that this feed was last updated, given in RFC 822 date format, for example: "Sat, 07 Sep 2002 00:00:01 GMT".
link: This element should contain a URL to the website that corresponds to the database or service providing this feed.

An RSS person feed provides at least the following elements within each item element:

pfif:person: This element contains child elements for the fields of the person record, as well as zero or more pfif:note elements. A service wishing to provide a complete export would include all the note records associated with the person here.
guid: This element should contain the value of the person_record_id field.
title: This element should contain the value of the first_name field, followed by a space and the value of the last_name field.
author: This element should contain the value of the author_email field, followed by a space and the value of the author_name field enclosed in parentheses.
pubDate: This element should contain the date in the source_date field in the person record, converted to RFC 822 date format, for example: "Sat, 07 Sep 2002 00:00:01 GMT". The timezone MUST be GMT and the year MUST have four digits.
description: This element should contain a human-readable HTML formatting of the information in the person record. It is up to the application to decide how to format the description.
source: This element should contain the value of the source_name field.
link: This element should contain the value of the source_url field.

7.2. RSS note feeds

An RSS note feed provides at least the following elements within the channel element:

title: This element should contain the name of this feed. This should include the title of the database or service providing this feed, followed by a more specific title that describes how the notes were selected from the database or service. For example, for a note feed about a particular person, the title could be the title of the service followed by the first name and last name of the person in question.
description: This element should contain a phrase or sentence describing the feed. This is the place to explain how the feed is produced, for example: "Scraped daily by FooMatic 2.3 from http://www.example.org/".
lastBuildDate: This element should contain the date and time in UTC that this feed was last updated, given in RFC 822 date format, for example: "Sat, 07 Sep 2002 00:00:01 GMT".
link: This element should contain a URL to the website that corresponds to the database or service providing this feed. For a note feed about a particular person, this link could point to the web page for that person's record.

An RSS note feed provides at least the following elements within each item element:

pfif:note: This element contains child elements for the fields of the note record.
guid: This element should contain the value of the note_record_id field.
author: This element should contain the value of the author_email field, followed by a space and the value of author_name field enclosed in parentheses.
pubDate: This element should contain the date in the source_date field in the note record, converted to RFC 822 date format, for example: "Sat, 07 Sep 2002 00:00:01 GMT". The timezone MUST be GMT and the year MUST have four digits.
description: This element should contain an HTML formatting of the text field in the note record. It is up to the application to decide how to format the description.

8. Suggested relational database schema

This section suggests a possible relational database schema for storing PFIF data. The exact details of a database design are up to each application; this is one possible starting point. A relational database could store PFIF records in two tables, person and note, for the two types of records.

PERSON table:
     string      person_record_id           primary key
     datetime    entry_date
     datetime    expiry_date
     string      author_name
     string      author_email
     string      author_phone
     string      source_name
     datetime    source_date
     string      source_url
     string      full_name
     string      first_name
     string      last_name
     string      sex
     string      date_of_birth
     string      age
     string      home_street
     string      home_neighborhood
     string      home_city
     string      home_state
     string      home_postal_code
     string      photo_url
     text        other

NOTE table:
     string      note_record_id             primary key
     string      person_record_id           foreign key not null
     string      linked_person_record_id    foreign key or null
     datetime    entry_date
     string      author_name
     string      author_email
     string      author_phone
     datetime    source_date
     boolean     found
     string      status
     string      email_of_found_person
     string      phone_of_found_person
     string      last_known_location
     text        text

To link a foreign person record with a local person record, the application adds a note associated with the local person record, with a linked_person_record_id field containing the person_record_id of the foreign record. The other fields of the note describe the circumstances of the decision to merge: source_date indicates the date of the decision, text gives the reason for the decision, and author_name names the person, program, or other entity that made the decision. This specification does not dictate how an application would decide whether to merge two records; a merge could be initiated by a human operator or by a software algorithm that look for records with similar data. Recording the merge decision in a note record makes it possible to back out of a bad merge decision, and recording the name of the person or program in the author_name field makes it possible to track down the cause of an incorrect merge.

When displaying a person record, the application can then look for all the non-empty linked_person_record_id fields among the notes that belong to that person record, and display all the linked records or a merged view of the linked records.

9. Changes from previous versions

9.1. Changes from PFIF 1.1 to PFIF 1.2

person records gained four new fields: sex, date_of_birth, age, and home_country. The home_zip field was replaced with home_postal_code.

note records gained three new fields: person_record_id, linked_person_record_id, and status.

In the PFIF XML format, note elements became allowed outside of person elements. Aside from the note_record_id and person_record_id fields, which had to appear first, the rest of the child elements became permissible in any order.

Atom entries and RSS items came to contain individual pfif:person and pfif:note elements with no enclosing pfif:pfif element.

9.2. Changes from PFIF 1.2 to PFIF 1.3

The source_date field became mandatory on person records. Records can be updated by (and only by) their original repository, and the source_date must be updated when a record changes.

person records gained the mandatory full_name field; first_name and last_name became optional.

person records gained the new expiry_date field, with conformance requirements for data deletion and propagation of the expiry date.

In the PFIF XML format, all the child elements of person elements and note elements became permissible in any order.

10. Acknowledgements

The initial data model on which the first version of PFIF was based is due to the CiviCRM team, David Geilhufe, and Kieran Lal. Luke Blanshard, Tony Chang, Josh Kleinpeter, Kieran Lal, Jonathan Plax, Gabe Wachob, Ka-Ping Yee, Steve Hakusa, Mark Prutsalis, Lee Schumacher, and other participants on the working group list (pfifspam@googlegroups.com) contributed to the current design of PFIF.