Guide for Researchers - AREUEA Data Insights

What this site is (and is not)

What it is

•A discoverability layer: browse datasets, scan metadata, identify fit-for-purpose sources
•A contribution layer: propose datasets, suggest revisions, contribute insights
•A governance layer: structured review process preserving academic integrity

What it is not

•Not a data repository (does not host raw data files)
•Not a promise of unrestricted access (some datasets are vendor/restricted)
•Not a substitute for provider documentation (canonical definitions and variable meanings remain with the provider)

Quick navigation

The current site is intentionally simple: each page embeds a live Airtable view or form.

Browse Datasets

The primary dataset catalog

Submit Dataset

Propose a new dataset for inclusion

Expert Insights

Community-authored notes on usage and applications

Submit Expert Insight

Contribute an insight

Create a contributor profile

Reviews

Reviewer task queues (intended for reviewers)

Note: If an embedded view fails to load (corporate firewalls, restrictive browser settings, temporary Airtable issues), each page provides an “Open in Airtable” fallback link.

How to browse the dataset catalog

A productive "data discovery" workflow typically looks like:

1Filter broadly by topical classification (e.g., Residential / Commercial / REITs) and geography
2Scan temporal coverage (Start/End year) and update frequency to validate feasibility for your study window
3Check access constraints (Public vs. vendor vs. restricted), and whether the provider offers an API
4Review metadata depth and practical notes: unit of observation, key variables, known limitations, and expert insights (when available)
5Triangulate: compare alternative datasets and cross-check critical claims against provider documentation

Interpreting key fields

Understanding these fields helps you quickly evaluate whether a dataset fits your research needs:

Dataset Name / Provider / URL

The minimal provenance triad

Access Type

Helps avoid "false availability" (dataset may exist but be contract-bound)

Start Year / End Year

First-order constraint for identification strategies, policy timing, sample construction

Geographic Coverage + Geographies Available

Distinguishes conceptual coverage from granularity of identifiers

Unit of Observation

Crucial for merges and inference (property, transaction, loan, household, firm)

Key Variables / Documentation

Signals replicability and downstream usability (and provides reference definitions)

Advantages / Limitations

Where experienced users encode "the stuff that bites you"

Contributing to the catalog

AREUEA Data Insights is meant to reflect the research community's working knowledge, not only what a dataset is, but how it behaves empirically and what it is good for in real research workflows.

Submitting a new dataset

Add a new dataset to the catalog with research-relevant insights. Strong submissions include advantages, limitations, alternatives, and practical details (frequency, coverage, and identifiers), plus a stable provider link for reference.

Submit Dataset•Browse Datasets (for revisions)

Data description (what it measures, how it is constructed)
Advantages (what it is uniquely good for; where it performs well)
Limitations (coverage gaps, measurement error, duplicates, bias, known breakpoints)
Alternative datasets / substitutes (and how they differ in coverage, cleaning, and access)
Data details: frequency of the data, frequency of update, unit of observation, geographic coverage, and finest geographic identifier
Provider URL + documentation (landing page + codebook/data dictionary, for reference if available)

Submitting an expert insight

Add practical, researcher-to-researcher notes about how a dataset behaves in real projects. Insights can be a recommendation, general feedback, correction, or a question/clarification, especially when they include concrete details.

Submit Expert Insight•Browse Expert Insights

An expert insight is most valuable when it is:

Methodologically specific (e.g., "coverage shifts after 2013 due to…", sample definition, known breaks)
Operational (how to merge, how identifiers behave, recommended filters, and common cleaning gotchas)
Evidence-backed (links to papers, provider docs, replication code, and any relevant validation evidence)
Explicit about scope (geography/time/segment where the insight holds, and where it may not apply)

Review and governance

AREUEA Data Insights supports two distinct review concepts:

1Submission reviews (automated governance)

Reviews of new dataset submissions or revision submissions

2Post-publication dataset reviews (manual QC)

Reviews of already published datasets for ongoing quality control

Submission review pipeline

When a submission is made, it moves through a structured pipeline:

Triage & duplicate guard

Detects whether the submission duplicates an existing dataset

Reviewer assignment

Assigns a minimum number of reviewers

Reviewer evaluations

Reviewers record a decision (approve / reject / needs info) and a rating

Consensus calculation

Aggregates reviews into an auto-decision

Auto-publish (if approved)

Publishes a new dataset record or patches an existing dataset

Audit trail

Records before/after snapshots of changes

Trust-weighted consensus

Not all reviews carry identical weight. Review influence is weighted by a trust score tiering concept:

3×

Tier 3

Senior experts, stewards

2×

Tier 2

Established reviewers

1×

Tier 1

New or untested reviewers

The system aggregates weighted approval weight, weighted reject weight, weighted average rating, and review count. A submission can be paused or blocked if risk flags are present (e.g., duplication, PII risk, license unknown), or if quorum thresholds are not met.

Revisions are “patches,” not rewrites

Revision submissions are designed to preserve catalog integrity:

→A revision proposal is stored as Proposed JSON, intentionally omitting blank fields so the system does not overwrite existing values
→On approval, the system applies a differential update (only changed fields are updated)
→Every publish action generates an entry in the Revisions audit trail, enabling traceability

Provenance, reproducibility, and citations

Provenance expectations

A record in the catalog is a scholarly pointer: it should help you identify and evaluate a data source quickly. However, dataset providers may change variable definitions, coverage, sampling frames, access pathways, and terms of use. Treat the catalog as a living index and verify critical details against provider documentation and archived materials.

Suggested citation practice

When referencing a dataset discovered via AREUEA Data Insights, a conservative academic practice is to cite:

The dataset provider's canonical citation (if provided)
The dataset landing page URL
The date accessed

If you reference the catalog entry itself (e.g., as a curated index), cite it as you would a database or curated resource, including an access date.

Frequently asked questions

Is everything here publicly accessible?

No. Some datasets are vendor-licensed or restricted. The catalog aims to describe access conditions.

Can I trust the metadata?

Treat the catalog metadata as curated but fallible. The governance process emphasizes review and auditability, but providers and environments change.

How do I propose a correction?

Use the relevant submission mechanism (new dataset or revision). Provide evidence (links, papers, documentation) whenever possible.

How do I become a reviewer?

Register as a contributor, and (separately) join the reviewer pool through the governance process used by AREUEA administrators.

Ready to explore the catalog?

Start browsing datasets, contribute to the community, or register as a contributor to participate in the peer review process.

Browse Datasets Submit Dataset