Understanding and Building the digest2 Population Model¶
This tutorial explains the population model that powers digest2 orbit classification, and shows how to inspect, understand, and rebuild it using pure Python.¶
digest2 classifies short-arc astrometric tracklets by comparing the observed motion against a 4-dimensional population model — a histogram over perihelion distance (q), eccentricity (e), inclination (i), and absolute magnitude (H). The model encodes both the predicted total asteroid population (from the S3M synthetic survey) and the known (already discovered) population, enabling the "NoID" scores that estimate the probability an unidentified object belongs to each orbit class.
This tutorial demonstrates:
- Reading and inspecting the existing model CSV
- Understanding the 4D histogram structure (Q, e, i, H dimensions)
- The 15 orbit class definitions and test functions
- Reading
s3m.datand examining the synthetic population - Building a new model from
s3m.dat+ an orbit catalog
References:
- Keys et al. 2019, "The digest2 NEO Classification Code" (PASP 131, 064501)
- Grav et al. 2011, "The Pan-STARRS Synthetic Solar System Model (S3M)" (PASP 123, 423)
Install and Import¶
Install the digest2 package from PyPI. The population module is pure Python and requires only numpy.
# pip install digest2
import numpy as np
from digest2.population import (
# Constants
QPART, EPART, IPART, HPART,
QX, EX, IX, HX, D2CLASSES,
CLASS_ABBR, CLASS_HEADING, CLASS_TESTS,
# Binning
qeih_to_bin, h_to_bin,
# Class tests
is_neo, is_mpcint, is_inner_mb, is_jfc,
# I/O
read_s3m, read_model_csv, write_model_csv,
read_astorb, build_model,
)
The 4D Histogram Structure¶
The population model bins orbits along four dimensions:
| Dimension | Symbol | Bins | Range |
|---|---|---|---|
| Perihelion distance | Q (q) | 29 | 0 to 100 AU |
| Eccentricity | e | 8 | 0 to 1.1 |
| Inclination | i | 11 | 0 to 180 deg |
| Absolute magnitude | H | 18 | <6 to 25.5 |
Total bins: 29 × 8 × 11 × 18 = 45,936
The bin boundaries are not uniform — they are denser in regions where most asteroids reside.
print(f"Model dimensions: Q={QX}, e={EX}, i={IX}, H={HX}")
print(f"Total bins: {QX * EX * IX * HX:,}")
print(f"Number of orbit classes: {D2CLASSES}")
print()
print(f"Q partition ({QX} bins):")
print(f" {list(QPART)}")
print(f"e partition ({EX} bins):")
print(f" {list(EPART)}")
print(f"i partition ({IX} bins):")
print(f" {list(IPART)}")
print(f"H partition ({HX} bins):")
print(f" {list(HPART)}")
The 15 Orbit Classes¶
Each orbit is tested against 15 class definitions. An orbit can belong to multiple classes simultaneously (e.g., a NEO is always "MPC interesting").
print(f"{'Idx':>3s} {'Abbr':>4s} {'Heading':<15s} Description")
print("-" * 65)
descriptions = [
"q<1.3 OR e>=0.5 OR i>=40 OR Q>10",
"q < 1.3 AU",
"NEO with H <= 22",
"NEO with H <= 18",
"1.3 <= q < 1.67, Q > 1.58",
"1.78 < a < 2.0, e < 0.18, 16 < i < 34",
"q > 1.5, 2.2 < a < 2.45, 20 < i < 27",
"q > 1.67, 2.1 < a < 2.5, i varies",
"2.5 < a < 2.8, e < 0.35, 24 < i < 37",
"2.55 < a < 2.72, e < 0.25, 20 < i < 23.5",
"2.5 < a < 2.8, e < 0.45, i < 20",
"2.8 < a < 3.25, e < 0.4, i varies",
"3.9 < a < 4.02, e < 0.4, i < 18",
"5.05 < a < 5.35, e < 0.22, i < 38",
"2 < Tj < 3, q >= 1.3",
]
for idx in range(D2CLASSES):
print(f"{idx:3d} {CLASS_ABBR[idx]:>4s} {CLASS_HEADING[idx]:<15s} {descriptions[idx]}")
Applying Class Tests¶
Each class test function takes (q, e, i, H) and returns True/False. Let's test a few example orbits.
# Example orbits
orbits = [
("Typical NEO", 0.8, 0.6, 12.0, 22.0),
("Main Belt", 2.3, 0.15, 8.0, 15.0),
("Jupiter Trojan", 4.8, 0.05, 10.0, 10.0),
("Mars Crosser", 1.5, 0.2, 10.0, 18.0),
("Jupiter Family Comet", 1.5, 0.7, 10.0, 12.0),
]
# Header
abbrs = " ".join(f"{a:>4s}" for a in CLASS_ABBR)
print(f"{'Orbit':<22s} {'q':>4s} {'e':>5s} {'i':>5s} {'H':>5s} {abbrs}")
print("-" * (40 + 5 * D2CLASSES))
for name, q, e, i, h in orbits:
flags = " ".join(
f"{' * ':>4s}" if CLASS_TESTS[c](q, e, i, h) else f"{'':>4s}"
for c in range(D2CLASSES)
)
print(f"{name:<22s} {q:4.1f} {e:5.2f} {i:5.1f} {h:5.1f} {flags}")
Binning Orbits¶
The qeih_to_bin() function maps orbital elements to 4D bin indices.
for name, q, e, i, h in orbits:
b = qeih_to_bin(q, e, i, h)
if b is not None:
iq, ie, ii, ih = b
print(f"{name:<22s} q={q:.1f} -> bin {iq:2d} (< {QPART[iq]:g}), "
f"e={e:.2f} -> bin {ie} (< {EPART[ie]:g}), "
f"i={i:.0f} -> bin {ii} (< {IPART[ii]:g}), "
f"H={h:.0f} -> bin {ih}")
else:
print(f"{name:<22s} OUT OF MODEL")
Reading the Existing Model¶
The model CSV (digest2.model.csv) contains four arrays:
- All,SS: Total predicted solar-system population (volume-scaled)
- Unk,SS: Unknown (undiscovered) population
- All,
: Total population for each of 15 classes - Unk,
: Unknown population for each class
Let's read the bundled model and examine its contents.
from pathlib import Path
# Path to the bundled model — adjust if running outside the repo
model_csv_path = Path("../population/digest2.model.csv")
if not model_csv_path.exists():
# Try the installed package location
import digest2
pkg_dir = Path(digest2.__file__).parent
model_csv_path = pkg_dir / "digest2_data" / "digest2.model.csv"
model = read_model_csv(str(model_csv_path))
print(f"All SS shape: {model['all_ss'].shape}")
print(f"Unk SS shape: {model['unk_ss'].shape}")
print(f"All class shape: {model['all_class'].shape}")
print(f"Unk class shape: {model['unk_class'].shape}")
print()
print(f"Total All SS (scaled): {model['all_ss'].sum():.1f}")
print(f"Total Unk SS (scaled): {model['unk_ss'].sum():.1f}")
print(f"Non-zero All SS bins: {np.count_nonzero(model['all_ss'])} / {model['all_ss'].size}")
print()
print("Per-class totals (scaled):")
for c in range(D2CLASSES):
all_total = model['all_class'][c].sum()
unk_total = model['unk_class'][c].sum()
print(f" {CLASS_ABBR[c]:>4s} ({CLASS_HEADING[c]:<15s}): "
f"All={all_total:12.1f} Unk={unk_total:12.1f}")
Reading the S3M Binned Data¶
The s3m.dat file contains the raw binned counts from the S3M synthetic survey — before volume scaling is applied. These represent the predicted total population of the solar system.
s3m_path = Path("../population/make_population/s3m.dat")
if s3m_path.exists():
all_ss_raw, all_class_raw = read_s3m(str(s3m_path))
print(f"Total synthetic orbits in model: {int(all_ss_raw.sum()):,}")
print(f"Non-zero bins: {np.count_nonzero(all_ss_raw)} / {all_ss_raw.size}")
print()
print("Per-class raw counts:")
for c in range(D2CLASSES):
count = int(all_class_raw[c].sum())
print(f" {CLASS_ABBR[c]:>4s} ({CLASS_HEADING[c]:<15s}): {count:>10,}")
else:
print("s3m.dat not found — skipping raw data inspection.")
print("(This file is only available in the source repository.)")
The Model-Building Pipeline¶
The full pipeline to create digest2.model.csv is:
S3M .s3m files ──[build_s3m]──> s3m.dat ──[build_model + astorb.dat]──> digest2.model.csv
build_s3m(): Reads S3M synthetic orbit files and bins them into a 4D histogram, producings3m.dat.build_model(): Readss3m.dat(the "all" population), readsastorb.dat(the "known" population), computes the "unknown" population as the difference, applies volume scaling, and writesdigest2.model.csv.
Volume scaling is needed because bin sizes are non-uniform. digest2 searches for the intersection of a 2D surface through 4D space, so each bin count is divided by the square root of the bin's 4D volume.
Below we demonstrate using build_model() with the existing s3m.dat and a small mock catalog. For a production model rebuild, you would use the real astorb.dat file from Lowell Observatory.
import tempfile, atexit, shutil
tmpdir = tempfile.mkdtemp(prefix="digest2_pop_tutorial_")
atexit.register(lambda: shutil.rmtree(tmpdir, ignore_errors=True))
if s3m_path.exists():
# Create a minimal mock astorb.dat with one known main-belt asteroid
# (In practice, astorb.dat has ~1.3 million entries)
import os
line = list(" " * 260)
# H = 15.0 (columns 42-46)
for j, ch in enumerate(" 15.0"):
line[42 + j] = ch
# i = 5.0 degrees (columns 147-156)
for j, ch in enumerate(" 5.0"):
line[147 + j] = ch
# e = 0.15 (columns 158-167)
for j, ch in enumerate(" 0.15"):
line[158 + j] = ch
# a = 2.5 AU (columns 169-180)
for j, ch in enumerate(" 2.5"):
line[169 + j] = ch
mock_catalog = os.path.join(tmpdir, "mock_astorb.dat")
with open(mock_catalog, "w") as f:
f.write("".join(line) + "\n")
output_csv = os.path.join(tmpdir, "test_model.csv")
stats = build_model(str(s3m_path), mock_catalog, output_csv)
print(f"S3M total orbits: {stats['s3m_total']:,}")
print(f"Catalog lines: {stats['catalog_stats']['lines']}")
print(f"Catalog usable: {stats['catalog_stats']['usable']}")
print(f"Output: {stats['output_path']}")
print()
# Verify the output is valid
new_model = read_model_csv(output_csv)
print(f"New model All SS total: {new_model['all_ss'].sum():.1f}")
print(f"New model Unk SS total: {new_model['unk_ss'].sum():.1f}")
else:
print("s3m.dat not found — cannot demonstrate build_model().")
print()
print("To rebuild the model, you need:")
print(" 1. s3m.dat (from build_s3m() or the C s3mbin program)")
print(" 2. astorb.dat (from ftp://ftp.lowell.edu/pub/elgb/astorb.dat.gz)")
print()
print("Usage:")
print(" from digest2.population import build_model")
print(' stats = build_model("s3m.dat", "astorb.dat", "digest2.model.csv")')
Summary¶
The digest2.population module provides a pure-Python implementation of the population model pipeline:
| Function | Purpose |
|---|---|
build_s3m() |
Bin S3M synthetic orbit files into s3m.dat |
read_s3m() |
Read a binned s3m.dat file |
read_astorb() |
Read and bin an astorb.dat known-object catalog |
build_model() |
Full pipeline: s3m.dat + catalog → digest2.model.csv |
read_model_csv() |
Read a model CSV back into arrays |
write_model_csv() |
Write model arrays to CSV |
qeih_to_bin() |
Convert orbital elements to 4D bin indices |
CLASS_TESTS |
List of 15 orbit class test functions |
Key concepts:
- The model is a 4D histogram over (q, e, i, H)
- 15 orbit classes are defined by cuts in orbital element space
- Volume scaling accounts for non-uniform bin sizes
- The "unknown" population (All minus Known) drives the NoID scores
For more information:
- digest2 documentation on GitHub
- Keys et al. 2019 — Algorithm description
For questions or feedback, contact the MPC via the Jira Helpdesk.