TL;DR — Long-term preservation

This project preserves multiple historical databases as stable, self-contained digital archives rather than transient web content. It is guided by principles consistent with ISO 16363,
it prioritises provenance, data integrity, and independence from any single website.
Preserved copies are designed to outlast the current platform and are intended for long-term deposit with trusted preservation institutions, including the Internet Archive, with a preservation horizon of 50–100 years.

Why preservation matters

Historical data does not fail all at once. It disappears slowly — through software upgrades, server failures, undocumented edits, lost context, or the quiet abandonment of websites.

The purpose of this project is not simply to display historical data, but to preserve it faithfully and intelligibly for the long term so that it can remain usable well beyond the lifetime of this website.

What we are preserving

This archive currently consists of multiple independent databases (13 at present), each representing a distinct historical dataset with its own structure, sources, and provenance.

Rather than merging or simplifying these datasets, we preserve each one as a self-contained historical object, including:

  • The original structure of the data
  • Relationships between people, households, places, and assets
  • Original spellings, ambiguities, and inconsistencies
  • Source references and import provenance

No dataset is treated as disposable or temporary. Each is curated as a long-term digital artefact.

What does “digital preservation” mean?


Digital preservation
is not the same as backup. Backups protect against short-term loss. Preservation is about ensuring that data remains understandable, verifiable, and reusable decades into the future, even when today’s software and websites no longer exist. To guide this work, we follow principles consistent with ISO 16363.

ISO 16363 — in plain language 

ISO 16363 is an international standard used by libraries, archives, and research institutions
to assess whether a digital repository can be trusted to preserve material over the long term.

It focuses on three simple questions:

  1. Is the data managed responsibly?
    (Clear scope, documented handling, no silent alteration)
  2. Is the integrity of the data protected?
    (Provenance, versioning, fixity, reversibility)
  3. Could the data outlive the system that currently hosts it?
    (Portability, documentation, independence from a single website)

This project is not formally certified under ISO 16363. However, it is designed to operate
in alignment with its principles at a scale appropriate to an independent digital humanities archive.

How we apply these principles

1. Faithful representation

We do not “clean”, modernise, or reinterpret historical data silently.
Original spellings, structures, and uncertainties are preserved as recorded.

2. Explicit provenance

Every dataset, table, and import batch is traceable back to its source material. Changes are documented and reversible.

3. Fixed and finite datasets

These are historical records, not live feeds. Once a dataset is complete, it becomes stable,
allowing preservation work to focus on integrity rather than constant change.

4. Independence from the website

The website is a presentation layer, not the archive itself. All datasets are designed to be exportable, intelligible, and preservable outside this site.

5. Integrity, fixity, and protection against silent change

Preservation is not only about keeping data available, but about ensuring that it has not been altered over time. To support this, the project uses cryptographic hashing to protect the integrity of preserved materials.

A cryptographic hash is a short digital fingerprint generated from a file or dataset. If even a  single character changes, the hash value changes. This makes it possible to verify, years or decades later, that preserved data remains bit-for-bit identical to the original.

For each preserved dataset and archival export, hash values are generated and recorded at the point of preservation. These hashes allow:

  • Verification that no silent corruption or alteration has occurred
  • Independent validation of preserved copies held in different locations
  • Future integrity checks without reliance on this website or its software

This approach aligns with the concept of fixity used in digital preservation practice and referenced by standards such as ISO 16363.

By recording and preserving integrity information alongside the data itself, the project ensures that future custodians can distinguish between faithful preservation and unrecorded modification, even if stewardship changes.

Preservation beyond this website

Websites are fragile. Domains lapse. Software changes. Projects end. For that reason, this archive is designed to outlast its own interface. Our long-term preservation strategy includes:

  • Flattened archival exports
    Each database can be preserved as a complete, self-contained dataset, independent of WordPress or custom software.
  • Multiple preservation copies
    Copies are maintained separately from the live site to reduce single-point failure.
  • Planned deposit with the Internet Archive
    Preserved datasets are intended for deposit with the Internet Archive, ensuring long-term public stewardship by a trusted, independent preservation institution.

The goal is not merely survival for a few years, but meaningful accessibility for 50–100 years.

 

 


This project follows principles consistent with ISO 16363 trusted digital repository practice,
adapted for an independent digital humanities archive.