Skip to main content
Researcher Documentation

Guide for Researchers

A comprehensive guide to using AREUEA Data Insights, a community-curated dataset catalog enriched with expert insights for real estate and urban economics research.

Graduate students, faculty, research staff
Last updated: January 2, 2026
01

What this site is (and is not)

What it is

  • A discoverability layer: browse datasets, scan metadata, identify fit-for-purpose sources
  • A contribution layer: propose datasets, suggest revisions, contribute insights
  • A governance layer: structured review process preserving academic integrity

What it is not

  • Not a data repository (does not host raw data files)
  • Not a promise of unrestricted access (some datasets are vendor/restricted)
  • Not a substitute for provider documentation (canonical definitions and variable meanings remain with the provider)
03

How to browse the dataset catalog

A productive "data discovery" workflow typically looks like:

  1. 1Filter broadly by topical classification (e.g., Residential / Commercial / REITs) and geography
  2. 2Scan temporal coverage (Start/End year) and update frequency to validate feasibility for your study window
  3. 3Check access constraints (Public vs. vendor vs. restricted), and whether the provider offers an API
  4. 4Review metadata depth and practical notes: unit of observation, key variables, known limitations, and expert insights (when available)
  5. 5Triangulate: compare alternative datasets and cross-check critical claims against provider documentation

Interpreting key fields

Understanding these fields helps you quickly evaluate whether a dataset fits your research needs:

Dataset Name / Provider / URL
The minimal provenance triad
Access Type
Helps avoid "false availability" (dataset may exist but be contract-bound)
Start Year / End Year
First-order constraint for identification strategies, policy timing, sample construction
Geographic Coverage + Geographies Available
Distinguishes conceptual coverage from granularity of identifiers
Unit of Observation
Crucial for merges and inference (property, transaction, loan, household, firm)
Key Variables / Documentation
Signals replicability and downstream usability (and provides reference definitions)
Advantages / Limitations
Where experienced users encode "the stuff that bites you"
04

Contributing to the catalog

AREUEA Data Insights is meant to reflect the research community's working knowledge, not only what a dataset is, but how it behaves empirically and what it is good for in real research workflows.

Submitting a new dataset

Add a new dataset to the catalog with research-relevant insights. Strong submissions include advantages, limitations, alternatives, and practical details (frequency, coverage, and identifiers), plus a stable provider link for reference.

  • Data description (what it measures, how it is constructed)
  • Advantages (what it is uniquely good for; where it performs well)
  • Limitations (coverage gaps, measurement error, duplicates, bias, known breakpoints)
  • Alternative datasets / substitutes (and how they differ in coverage, cleaning, and access)
  • Data details: frequency of the data, frequency of update, unit of observation, geographic coverage, and finest geographic identifier
  • Provider URL + documentation (landing page + codebook/data dictionary, for reference if available)

Submitting an expert insight

Add practical, researcher-to-researcher notes about how a dataset behaves in real projects. Insights can be a recommendation, general feedback, correction, or a question/clarification, especially when they include concrete details.

An expert insight is most valuable when it is:

  • Methodologically specific (e.g., "coverage shifts after 2013 due to…", sample definition, known breaks)
  • Operational (how to merge, how identifiers behave, recommended filters, and common cleaning gotchas)
  • Evidence-backed (links to papers, provider docs, replication code, and any relevant validation evidence)
  • Explicit about scope (geography/time/segment where the insight holds, and where it may not apply)
05

Review and governance

AREUEA Data Insights supports two distinct review concepts:

1Submission reviews (automated governance)

Reviews of new dataset submissions or revision submissions

2Post-publication dataset reviews (manual QC)

Reviews of already published datasets for ongoing quality control

Submission review pipeline

When a submission is made, it moves through a structured pipeline:

1
Triage & duplicate guard
Detects whether the submission duplicates an existing dataset
2
Reviewer assignment
Assigns a minimum number of reviewers
3
Reviewer evaluations
Reviewers record a decision (approve / reject / needs info) and a rating
4
Consensus calculation
Aggregates reviews into an auto-decision
5
Auto-publish (if approved)
Publishes a new dataset record or patches an existing dataset
6
Audit trail
Records before/after snapshots of changes

Trust-weighted consensus

Not all reviews carry identical weight. Review influence is weighted by a trust score tiering concept:

3×
Tier 3
Senior experts, stewards
2×
Tier 2
Established reviewers
1×
Tier 1
New or untested reviewers

The system aggregates weighted approval weight, weighted reject weight, weighted average rating, and review count. A submission can be paused or blocked if risk flags are present (e.g., duplication, PII risk, license unknown), or if quorum thresholds are not met.

Revisions are “patches,” not rewrites

Revision submissions are designed to preserve catalog integrity:

  • A revision proposal is stored as Proposed JSON, intentionally omitting blank fields so the system does not overwrite existing values
  • On approval, the system applies a differential update (only changed fields are updated)
  • Every publish action generates an entry in the Revisions audit trail, enabling traceability
06

Provenance, reproducibility, and citations

Provenance expectations

A record in the catalog is a scholarly pointer: it should help you identify and evaluate a data source quickly. However, dataset providers may change variable definitions, coverage, sampling frames, access pathways, and terms of use. Treat the catalog as a living index and verify critical details against provider documentation and archived materials.

Suggested citation practice

When referencing a dataset discovered via AREUEA Data Insights, a conservative academic practice is to cite:

  • The dataset provider's canonical citation (if provided)
  • The dataset landing page URL
  • The date accessed

If you reference the catalog entry itself (e.g., as a curated index), cite it as you would a database or curated resource, including an access date.

07

Frequently asked questions

Is everything here publicly accessible?

No. Some datasets are vendor-licensed or restricted. The catalog aims to describe access conditions.

Can I trust the metadata?

Treat the catalog metadata as curated but fallible. The governance process emphasizes review and auditability, but providers and environments change.

How do I propose a correction?

Use the relevant submission mechanism (new dataset or revision). Provide evidence (links, papers, documentation) whenever possible.

How do I become a reviewer?

Register as a contributor, and (separately) join the reviewer pool through the governance process used by AREUEA administrators.

Ready to explore the catalog?

Start browsing datasets, contribute to the community, or register as a contributor to participate in the peer review process.