Understanding and Building the `digest2` Population Model¶

This tutorial explains the population model that powers `digest2` orbit classification, and shows how to inspect, understand, and rebuild it using pure Python.¶

digest2 classifies short-arc astrometric tracklets by comparing the observed motion against a 4-dimensional population model — a histogram over perihelion distance (q), eccentricity (e), inclination (i), and absolute magnitude (H). The model encodes both the predicted total asteroid population (from the S3M synthetic survey) and the known (already discovered) population, enabling the "NoID" scores that estimate the probability an unidentified object belongs to each orbit class.

This tutorial demonstrates:

Reading and inspecting the existing model CSV
Understanding the 4D histogram structure (Q, e, i, H dimensions)
The 15 orbit class definitions and test functions
Reading s3m.dat and examining the synthetic population
Building a new model from s3m.dat + an orbit catalog

References:

Keys et al. 2019, "The digest2 NEO Classification Code" (PASP 131, 064501)
Grav et al. 2011, "The Pan-STARRS Synthetic Solar System Model (S3M)" (PASP 123, 423)

Install and Import¶

Install the digest2 package from PyPI. The population module is pure Python and requires only numpy.

In [ ]:

Copied!

# pip install digest2
# pip install digest2

In [ ]:

Copied!





import numpy as np
from digest2.population import (
    # Constants
    QPART, EPART, IPART, HPART,
    QX, EX, IX, HX, D2CLASSES,
    CLASS_ABBR, CLASS_HEADING, CLASS_TESTS,
    # Binning
    qeih_to_bin, h_to_bin,
    # Class tests
    is_neo, is_mpcint, is_inner_mb, is_jfc,
    # I/O
    read_s3m, read_model_csv, write_model_csv,
    read_astorb, build_model,
)
import numpy as np
from digest2.population import (
    # Constants
    QPART, EPART, IPART, HPART,
    QX, EX, IX, HX, D2CLASSES,
    CLASS_ABBR, CLASS_HEADING, CLASS_TESTS,
    # Binning
    qeih_to_bin, h_to_bin,
    # Class tests
    is_neo, is_mpcint, is_inner_mb, is_jfc,
    # I/O
    read_s3m, read_model_csv, write_model_csv,
    read_astorb, build_model,
)

The 4D Histogram Structure¶

The population model bins orbits along four dimensions:

Dimension	Symbol	Bins	Range
Perihelion distance	Q (q)	29	0 to 100 AU
Eccentricity	e	8	0 to 1.1
Inclination	i	11	0 to 180 deg
Absolute magnitude	H	18	<6 to 25.5

Total bins: 29 × 8 × 11 × 18 = 45,936

The bin boundaries are not uniform — they are denser in regions where most asteroids reside.

In [ ]:

Copied!





print(f"Model dimensions: Q={QX}, e={EX}, i={IX}, H={HX}")
print(f"Total bins: {QX * EX * IX * HX:,}")
print(f"Number of orbit classes: {D2CLASSES}")
print()
print(f"Q partition ({QX} bins):")
print(f"  {list(QPART)}")
print(f"e partition ({EX} bins):")
print(f"  {list(EPART)}")
print(f"i partition ({IX} bins):")
print(f"  {list(IPART)}")
print(f"H partition ({HX} bins):")
print(f"  {list(HPART)}")
print(f"Model dimensions: Q={QX}, e={EX}, i={IX}, H={HX}")
print(f"Total bins: {QX * EX * IX * HX:,}")
print(f"Number of orbit classes: {D2CLASSES}")
print()
print(f"Q partition ({QX} bins):")
print(f"  {list(QPART)}")
print(f"e partition ({EX} bins):")
print(f"  {list(EPART)}")
print(f"i partition ({IX} bins):")
print(f"  {list(IPART)}")
print(f"H partition ({HX} bins):")
print(f"  {list(HPART)}")

The 15 Orbit Classes¶

Each orbit is tested against 15 class definitions. An orbit can belong to multiple classes simultaneously (e.g., a NEO is always "MPC interesting").

In [ ]:

Copied!





print(f"{'Idx':>3s}  {'Abbr':>4s}  {'Heading':<15s}  Description")
print("-" * 65)
descriptions = [
    "q<1.3 OR e>=0.5 OR i>=40 OR Q>10",
    "q < 1.3 AU",
    "NEO with H <= 22",
    "NEO with H <= 18",
    "1.3 <= q < 1.67, Q > 1.58",
    "1.78 < a < 2.0, e < 0.18, 16 < i < 34",
    "q > 1.5, 2.2 < a < 2.45, 20 < i < 27",
    "q > 1.67, 2.1 < a < 2.5, i varies",
    "2.5 < a < 2.8, e < 0.35, 24 < i < 37",
    "2.55 < a < 2.72, e < 0.25, 20 < i < 23.5",
    "2.5 < a < 2.8, e < 0.45, i < 20",
    "2.8 < a < 3.25, e < 0.4, i varies",
    "3.9 < a < 4.02, e < 0.4, i < 18",
    "5.05 < a < 5.35, e < 0.22, i < 38",
    "2 < Tj < 3, q >= 1.3",
]
for idx in range(D2CLASSES):
    print(f"{idx:3d}  {CLASS_ABBR[idx]:>4s}  {CLASS_HEADING[idx]:<15s}  {descriptions[idx]}")
print(f"{'Idx':>3s}  {'Abbr':>4s}  {'Heading':<15s}  Description")
print("-" * 65)
descriptions = [
    "q<1.3 OR e>=0.5 OR i>=40 OR Q>10",
    "q < 1.3 AU",
    "NEO with H <= 22",
    "NEO with H <= 18",
    "1.3 <= q < 1.67, Q > 1.58",
    "1.78 < a < 2.0, e < 0.18, 16 < i < 34",
    "q > 1.5, 2.2 < a < 2.45, 20 < i < 27",
    "q > 1.67, 2.1 < a < 2.5, i varies",
    "2.5 < a < 2.8, e < 0.35, 24 < i < 37",
    "2.55 < a < 2.72, e < 0.25, 20 < i < 23.5",
    "2.5 < a < 2.8, e < 0.45, i < 20",
    "2.8 < a < 3.25, e < 0.4, i varies",
    "3.9 < a < 4.02, e < 0.4, i < 18",
    "5.05 < a < 5.35, e < 0.22, i < 38",
    "2 < Tj < 3, q >= 1.3",
]
for idx in range(D2CLASSES):
    print(f"{idx:3d}  {CLASS_ABBR[idx]:>4s}  {CLASS_HEADING[idx]:<15s}  {descriptions[idx]}")

Applying Class Tests¶

Each class test function takes (q, e, i, H) and returns True/False. Let's test a few example orbits.

In [ ]:

Copied!





# Example orbits
orbits = [
    ("Typical NEO",       0.8, 0.6, 12.0, 22.0),
    ("Main Belt",         2.3, 0.15, 8.0, 15.0),
    ("Jupiter Trojan",    4.8, 0.05, 10.0, 10.0),
    ("Mars Crosser",      1.5, 0.2, 10.0, 18.0),
    ("Jupiter Family Comet", 1.5, 0.7, 10.0, 12.0),
]

# Header
abbrs = " ".join(f"{a:>4s}" for a in CLASS_ABBR)
print(f"{'Orbit':<22s}  {'q':>4s} {'e':>5s} {'i':>5s} {'H':>5s}  {abbrs}")
print("-" * (40 + 5 * D2CLASSES))

for name, q, e, i, h in orbits:
    flags = " ".join(
        f"{'  * ':>4s}" if CLASS_TESTS[c](q, e, i, h) else f"{'':>4s}"
        for c in range(D2CLASSES)
    )
    print(f"{name:<22s}  {q:4.1f} {e:5.2f} {i:5.1f} {h:5.1f}  {flags}")
# Example orbits
orbits = [
    ("Typical NEO",       0.8, 0.6, 12.0, 22.0),
    ("Main Belt",         2.3, 0.15, 8.0, 15.0),
    ("Jupiter Trojan",    4.8, 0.05, 10.0, 10.0),
    ("Mars Crosser",      1.5, 0.2, 10.0, 18.0),
    ("Jupiter Family Comet", 1.5, 0.7, 10.0, 12.0),
]

# Header
abbrs = " ".join(f"{a:>4s}" for a in CLASS_ABBR)
print(f"{'Orbit':<22s}  {'q':>4s} {'e':>5s} {'i':>5s} {'H':>5s}  {abbrs}")
print("-" * (40 + 5 * D2CLASSES))

for name, q, e, i, h in orbits:
    flags = " ".join(
        f"{'  * ':>4s}" if CLASS_TESTS[c](q, e, i, h) else f"{'':>4s}"
        for c in range(D2CLASSES)
    )
    print(f"{name:<22s}  {q:4.1f} {e:5.2f} {i:5.1f} {h:5.1f}  {flags}")

Binning Orbits¶

The qeih_to_bin() function maps orbital elements to 4D bin indices.

In [ ]:

Copied!





for name, q, e, i, h in orbits:
    b = qeih_to_bin(q, e, i, h)
    if b is not None:
        iq, ie, ii, ih = b
        print(f"{name:<22s}  q={q:.1f} -> bin {iq:2d} (< {QPART[iq]:g}), "
              f"e={e:.2f} -> bin {ie} (< {EPART[ie]:g}), "
              f"i={i:.0f} -> bin {ii} (< {IPART[ii]:g}), "
              f"H={h:.0f} -> bin {ih}")
    else:
        print(f"{name:<22s}  OUT OF MODEL")
for name, q, e, i, h in orbits:
    b = qeih_to_bin(q, e, i, h)
    if b is not None:
        iq, ie, ii, ih = b
        print(f"{name:<22s}  q={q:.1f} -> bin {iq:2d} (< {QPART[iq]:g}), "
              f"e={e:.2f} -> bin {ie} (< {EPART[ie]:g}), "
              f"i={i:.0f} -> bin {ii} (< {IPART[ii]:g}), "
              f"H={h:.0f} -> bin {ih}")
    else:
        print(f"{name:<22s}  OUT OF MODEL")

Reading the Existing Model¶

The model CSV (digest2.model.csv) contains four arrays:

All,SS: Total predicted solar-system population (volume-scaled)
Unk,SS: Unknown (undiscovered) population
All,: Total population for each of 15 classes
Unk,: Unknown population for each class

Let's read the bundled model and examine its contents.

In [ ]:

Copied!





from pathlib import Path

# Path to the bundled model — adjust if running outside the repo
model_csv_path = Path("../population/digest2.model.csv")
if not model_csv_path.exists():
    # Try the installed package location
    import digest2
    pkg_dir = Path(digest2.__file__).parent
    model_csv_path = pkg_dir / "digest2_data" / "digest2.model.csv"

model = read_model_csv(str(model_csv_path))

print(f"All SS shape: {model['all_ss'].shape}")
print(f"Unk SS shape: {model['unk_ss'].shape}")
print(f"All class shape: {model['all_class'].shape}")
print(f"Unk class shape: {model['unk_class'].shape}")
print()
print(f"Total All SS (scaled): {model['all_ss'].sum():.1f}")
print(f"Total Unk SS (scaled): {model['unk_ss'].sum():.1f}")
print(f"Non-zero All SS bins: {np.count_nonzero(model['all_ss'])} / {model['all_ss'].size}")
print()
print("Per-class totals (scaled):")
for c in range(D2CLASSES):
    all_total = model['all_class'][c].sum()
    unk_total = model['unk_class'][c].sum()
    print(f"  {CLASS_ABBR[c]:>4s} ({CLASS_HEADING[c]:<15s}): "
          f"All={all_total:12.1f}  Unk={unk_total:12.1f}")
from pathlib import Path

# Path to the bundled model — adjust if running outside the repo
model_csv_path = Path("../population/digest2.model.csv")
if not model_csv_path.exists():
    # Try the installed package location
    import digest2
    pkg_dir = Path(digest2.__file__).parent
    model_csv_path = pkg_dir / "digest2_data" / "digest2.model.csv"

model = read_model_csv(str(model_csv_path))

print(f"All SS shape: {model['all_ss'].shape}")
print(f"Unk SS shape: {model['unk_ss'].shape}")
print(f"All class shape: {model['all_class'].shape}")
print(f"Unk class shape: {model['unk_class'].shape}")
print()
print(f"Total All SS (scaled): {model['all_ss'].sum():.1f}")
print(f"Total Unk SS (scaled): {model['unk_ss'].sum():.1f}")
print(f"Non-zero All SS bins: {np.count_nonzero(model['all_ss'])} / {model['all_ss'].size}")
print()
print("Per-class totals (scaled):")
for c in range(D2CLASSES):
    all_total = model['all_class'][c].sum()
    unk_total = model['unk_class'][c].sum()
    print(f"  {CLASS_ABBR[c]:>4s} ({CLASS_HEADING[c]:<15s}): "
          f"All={all_total:12.1f}  Unk={unk_total:12.1f}")

Reading the S3M Binned Data¶

The s3m.dat file contains the raw binned counts from the S3M synthetic survey — before volume scaling is applied. These represent the predicted total population of the solar system.

In [ ]:

Copied!





s3m_path = Path("../population/make_population/s3m.dat")

if s3m_path.exists():
    all_ss_raw, all_class_raw = read_s3m(str(s3m_path))
    
    print(f"Total synthetic orbits in model: {int(all_ss_raw.sum()):,}")
    print(f"Non-zero bins: {np.count_nonzero(all_ss_raw)} / {all_ss_raw.size}")
    print()
    print("Per-class raw counts:")
    for c in range(D2CLASSES):
        count = int(all_class_raw[c].sum())
        print(f"  {CLASS_ABBR[c]:>4s} ({CLASS_HEADING[c]:<15s}): {count:>10,}")
else:
    print("s3m.dat not found — skipping raw data inspection.")
    print("(This file is only available in the source repository.)")
s3m_path = Path("../population/make_population/s3m.dat")

if s3m_path.exists():
    all_ss_raw, all_class_raw = read_s3m(str(s3m_path))
    
    print(f"Total synthetic orbits in model: {int(all_ss_raw.sum()):,}")
    print(f"Non-zero bins: {np.count_nonzero(all_ss_raw)} / {all_ss_raw.size}")
    print()
    print("Per-class raw counts:")
    for c in range(D2CLASSES):
        count = int(all_class_raw[c].sum())
        print(f"  {CLASS_ABBR[c]:>4s} ({CLASS_HEADING[c]:<15s}): {count:>10,}")
else:
    print("s3m.dat not found — skipping raw data inspection.")
    print("(This file is only available in the source repository.)")

The Model-Building Pipeline¶

The full pipeline to create digest2.model.csv is:

S3M .s3m files  ──[build_s3m]──>  s3m.dat  ──[build_model + astorb.dat]──>  digest2.model.csv

build_s3m(): Reads S3M synthetic orbit files and bins them into a 4D histogram, producing s3m.dat.
build_model(): Reads s3m.dat (the "all" population), reads astorb.dat (the "known" population), computes the "unknown" population as the difference, applies volume scaling, and writes digest2.model.csv.

Volume scaling is needed because bin sizes are non-uniform. digest2 searches for the intersection of a 2D surface through 4D space, so each bin count is divided by the square root of the bin's 4D volume.

Below we demonstrate using build_model() with the existing s3m.dat and a small mock catalog. For a production model rebuild, you would use the real astorb.dat file from Lowell Observatory.

In [ ]:

Copied!





import tempfile, atexit, shutil

tmpdir = tempfile.mkdtemp(prefix="digest2_pop_tutorial_")
atexit.register(lambda: shutil.rmtree(tmpdir, ignore_errors=True))

if s3m_path.exists():
    # Create a minimal mock astorb.dat with one known main-belt asteroid
    # (In practice, astorb.dat has ~1.3 million entries)
    import os
    
    line = list(" " * 260)
    # H = 15.0 (columns 42-46)
    for j, ch in enumerate(" 15.0"):
        line[42 + j] = ch
    # i = 5.0 degrees (columns 147-156)
    for j, ch in enumerate("       5.0"):
        line[147 + j] = ch
    # e = 0.15 (columns 158-167)
    for j, ch in enumerate("      0.15"):
        line[158 + j] = ch
    # a = 2.5 AU (columns 169-180)
    for j, ch in enumerate("         2.5"):
        line[169 + j] = ch
    
    mock_catalog = os.path.join(tmpdir, "mock_astorb.dat")
    with open(mock_catalog, "w") as f:
        f.write("".join(line) + "\n")
    
    output_csv = os.path.join(tmpdir, "test_model.csv")
    stats = build_model(str(s3m_path), mock_catalog, output_csv)
    
    print(f"S3M total orbits: {stats['s3m_total']:,}")
    print(f"Catalog lines: {stats['catalog_stats']['lines']}")
    print(f"Catalog usable: {stats['catalog_stats']['usable']}")
    print(f"Output: {stats['output_path']}")
    print()
    
    # Verify the output is valid
    new_model = read_model_csv(output_csv)
    print(f"New model All SS total: {new_model['all_ss'].sum():.1f}")
    print(f"New model Unk SS total: {new_model['unk_ss'].sum():.1f}")
else:
    print("s3m.dat not found — cannot demonstrate build_model().")
    print()
    print("To rebuild the model, you need:")
    print("  1. s3m.dat (from build_s3m() or the C s3mbin program)")
    print("  2. astorb.dat (from ftp://ftp.lowell.edu/pub/elgb/astorb.dat.gz)")
    print()
    print("Usage:")
    print("  from digest2.population import build_model")
    print('  stats = build_model("s3m.dat", "astorb.dat", "digest2.model.csv")')
import tempfile, atexit, shutil

tmpdir = tempfile.mkdtemp(prefix="digest2_pop_tutorial_")
atexit.register(lambda: shutil.rmtree(tmpdir, ignore_errors=True))

if s3m_path.exists():
    # Create a minimal mock astorb.dat with one known main-belt asteroid
    # (In practice, astorb.dat has ~1.3 million entries)
    import os
    
    line = list(" " * 260)
    # H = 15.0 (columns 42-46)
    for j, ch in enumerate(" 15.0"):
        line[42 + j] = ch
    # i = 5.0 degrees (columns 147-156)
    for j, ch in enumerate("       5.0"):
        line[147 + j] = ch
    # e = 0.15 (columns 158-167)
    for j, ch in enumerate("      0.15"):
        line[158 + j] = ch
    # a = 2.5 AU (columns 169-180)
    for j, ch in enumerate("         2.5"):
        line[169 + j] = ch
    
    mock_catalog = os.path.join(tmpdir, "mock_astorb.dat")
    with open(mock_catalog, "w") as f:
        f.write("".join(line) + "\n")
    
    output_csv = os.path.join(tmpdir, "test_model.csv")
    stats = build_model(str(s3m_path), mock_catalog, output_csv)
    
    print(f"S3M total orbits: {stats['s3m_total']:,}")
    print(f"Catalog lines: {stats['catalog_stats']['lines']}")
    print(f"Catalog usable: {stats['catalog_stats']['usable']}")
    print(f"Output: {stats['output_path']}")
    print()
    
    # Verify the output is valid
    new_model = read_model_csv(output_csv)
    print(f"New model All SS total: {new_model['all_ss'].sum():.1f}")
    print(f"New model Unk SS total: {new_model['unk_ss'].sum():.1f}")
else:
    print("s3m.dat not found — cannot demonstrate build_model().")
    print()
    print("To rebuild the model, you need:")
    print("  1. s3m.dat (from build_s3m() or the C s3mbin program)")
    print("  2. astorb.dat (from ftp://ftp.lowell.edu/pub/elgb/astorb.dat.gz)")
    print()
    print("Usage:")
    print("  from digest2.population import build_model")
    print('  stats = build_model("s3m.dat", "astorb.dat", "digest2.model.csv")')

Summary¶

The digest2.population module provides a pure-Python implementation of the population model pipeline:

Function	Purpose
`build_s3m()`	Bin S3M synthetic orbit files into `s3m.dat`
`read_s3m()`	Read a binned `s3m.dat` file
`read_astorb()`	Read and bin an `astorb.dat` known-object catalog
`build_model()`	Full pipeline: s3m.dat + catalog → `digest2.model.csv`
`read_model_csv()`	Read a model CSV back into arrays
`write_model_csv()`	Write model arrays to CSV
`qeih_to_bin()`	Convert orbital elements to 4D bin indices
`CLASS_TESTS`	List of 15 orbit class test functions

Key concepts:

The model is a 4D histogram over (q, e, i, H)
15 orbit classes are defined by cuts in orbital element space
Volume scaling accounts for non-uniform bin sizes
The "unknown" population (All minus Known) drives the NoID scores

For more information:

digest2 documentation on GitHub
Keys et al. 2019 — Algorithm description

For questions or feedback, contact the MPC via the Jira Helpdesk.

Understanding and Building the digest2 Population Model¶

This tutorial explains the population model that powers digest2 orbit classification, and shows how to inspect, understand, and rebuild it using pure Python.¶

Install and Import¶

The 4D Histogram Structure¶

The 15 Orbit Classes¶

Applying Class Tests¶

Binning Orbits¶

Reading the Existing Model¶

Reading the S3M Binned Data¶

The Model-Building Pipeline¶

Summary¶

Understanding and Building the `digest2` Population Model¶

This tutorial explains the population model that powers `digest2` orbit classification, and shows how to inspect, understand, and rebuild it using pure Python.¶