Skip to content
EB5 Status

Data Center Methodology

This page describes how EB5Status acquires, parses, validates, and publishes EB-5 immigration data. We believe that methodology transparency is as important as the data itself. Every decision about how we handle source files, resolve ambiguities, and label confidence levels is documented here.

1. Trust Tier Definitions

Every data point on EB5Status carries a trust tier label that tells readers exactly where the number came from and how confident they should be in its accuracy. The five tiers are:

Official Data

Blue: Official Data

Sourced directly from government agency publications, including USCIS, the U.S. Department of State, and the Department of Homeland Security. These numbers are reproduced exactly as published, with no modification or calculation.

Derived

Gray: Derived

Calculated from official data using disclosed methodology. Examples include approval rates (approvals divided by total completions) and net backlog change over time. The underlying formula is always documented.

FOIA Data

Green: FOIA Data

Obtained via Freedom of Information Act requests submitted by EB5Status. FOIA data is official government information that is not routinely published. Each FOIA response includes a request number and response date for verification.

Estimated

Yellow: Estimated

Inferred from partial data or statistical models. Estimates are used when official data is incomplete, suppressed, or not yet published. The estimation method and confidence interval are disclosed alongside the value.

Editorial

Orange: Editorial

Analysis, commentary, or interpretation by EB5Status. Editorial content reflects our informed assessment of trends, risks, or implications, and is clearly labeled to distinguish it from factual reporting.

2. Data Acquisition Process

All primary data is downloaded directly from official government websites, principally uscis.gov. We do not scrape websites, use third party data aggregators, or rely on industry association reports as primary sources. Each download is logged with the exact URL, download timestamp, and file hash.

The full list of source files, including direct download links, is available on the Source Registry page.

3. Parsing Methodology

USCIS publishes data in Excel (.xlsx) format. Our parsing pipeline follows a three step process:

  1. 1.Excel to CSV normalization. Raw Excel files are converted to CSV with consistent encoding (UTF-8), date formatting (ISO 8601), and numeric precision.
  2. 2.Field mapping. Column headers in USCIS files vary across releases. We maintain a mapping table that translates each source column to our standardized field names as defined in the data dictionary.
  3. 3.Data type validation. Every field is validated against its expected type (integer, date, percentage, suppression code). Values that fail validation are flagged for manual review rather than silently dropped.

4. Quality Controls

Before any dataset is published, it passes through the following validation checks:

  • Sum validation. Subtotals are checked against the sum of their component rows. Any mismatch triggers a manual review.
  • Cross file consistency. When the same metric appears in multiple source files (for example, total I-526E receipts in both the quarterly report and the TEA breakdown), we verify that the values match.
  • Suppression code handling. USCIS uses "D" and "H" to suppress values below statistical thresholds. We preserve these codes rather than imputing values, and display them with an asterisk and explanatory footnote.
  • Temporal continuity. Quarter over quarter changes that exceed expected ranges are flagged for review. This catches both data errors and genuine trend breaks.

5. Versioning System

EB5Status uses two separate versioning schemes to track changes:

Dataset Versions

Format: eb5_datasets_v{YYYY}.{MM}.{seq}

The year and month indicate when the dataset was compiled. The sequence number increments for multiple releases within the same month (for example, a correction or addition).

Methodology Versions

Format: calculations_v{major}.{minor}

Major version changes when a calculation formula is revised or a new derived metric is introduced. Minor version changes for clarifications or documentation improvements that do not affect output values.

6. Correction Policy

When an error is discovered in published data, we follow a documented correction process:

  1. 1.The incorrect value is replaced with the corrected value on the affected page.
  2. 2.A correction entry is logged in the Data Changelog, recording the date, affected field, before value, after value, and reason for correction.
  3. 3.If the correction is material (changes a headline metric or affects a conclusion), a note is added to the page for 30 days.

All corrections are append only. We never silently revise published numbers.

7. Known Limitations

We believe in disclosing limitations alongside data. The following constraints apply to the current EB5Status dataset:

  1. 1.FY2025 Q3 is the latest available quarter. USCIS publishes quarterly data with an approximate 90 day lag after the quarter closes.
  2. 2.Historical quarterly data (pre FY2025) is not yet included in our structured dataset. We are backfilling this data as resources allow.
  3. 3.The I-485 pending inventory does not break down by set aside category (rural, high unemployment area, infrastructure). Only the aggregate EB-5 category total is available.
  4. 4.Suppressed values (marked D or H) indicate counts below 11 that USCIS withholds for privacy reasons. These values cannot be precisely determined from public data.
  5. 5.Visa issuance data comes from the U.S. Department of State, not USCIS. It is tracked separately and may have different reporting periods and lag times.
  6. 6.Processing time figures represent the median for all completions during the quarter. Individual case timelines may vary significantly based on case complexity, requests for evidence, and other factors.

8. Reproducibility

We publish source file URLs so anyone can download the same files and verify our numbers independently. This is a deliberate design choice. If our data cannot be independently reproduced from public sources, it is not trustworthy.

The Source Registry page provides direct download links for every file we use. If you find any value on EB5Status that does not match the source file, we want to know about it.

Related Resources

Priority date movements, processing time changes, and policy updates.