StarHash: unique, memorable, and deterministic names for astronomical objects

Published 31 Mar 2026 in astro-ph.IM and astro-ph.HE | (2603.29584v1)

Abstract: The naming of astronomical objects has represented among the most significant challenges in the record-keeping of the field since the very beginning. Long and unwieldy coordinate names, uninformative and ambiguous internal names, and the sheer volume of aliases accumulated for some of the most studied objects conspire to complicate our study of the celestial sphere. This paper proposes StarHash, a reproducible, open-source astronomical naming scheme based on the terrestrial concept of geohashing, but re-implemented from the ground up for the rigorous demands of astronomy. Every 3.2 arcsec patch of sky now has three words associated with it, enabling the precise localisation of astronomical sources, and an easily communicable and memorable identifier. A carefully selected wordlist reduces ambiguity due to plurals and homophones, whilst the use of format-preserving encryption minimises residual spatial correlation in StarHash-derived identifiers. Pre-computed names for several existing catalogues are provided, alongside a Python reference implementation for validation and integration into databases, transient brokers, and other similar projects. Although not intended to be the final word in the naming of astronomical objects, StarHash humbly provides a memorable alternative to the status quo, and is intended to spark a discussion about this most foundational of issues in astronomy.

Abstract PDF Upgrade to Chat

Authors (1)

T. L. Killestein

Summary

The paper introduces StarHash, detailing a method to generate memorable, deterministic astronomical object names via HEALPix grid partitioning and FF3-based index scrambling.
The methodology partitions the sky into ~3.2 arcsec pixels and maps each coordinate to a unique three-word phrase drawn from an optimized word list for error resilience.
Empirical evaluation on ~50 million coordinate pairs confirms the system’s robustness against transcription errors and spatial misidentification, ensuring efficiency for large surveys.

StarHash: Deterministic, Memorable, and Scrambled Phrase-Based Astronomy Object Identifiers

Motivation and Context

The proliferation of heterogeneous naming conventions for astronomical objects has been a persistent source of confusion in astronomical record keeping, data retrieval, and human communication. Traditional coordinate-based ("telephone number") identifiers are neither robust to transcription errors nor memorable for practitioners, while the accumulation of aliases across surveys and catalogues exacerbates cross-matching complexity. Survey-dependent "internal" naming schemes further fragment identification, especially for transients. The field faces amplified challenges with the advent of high-cadence surveys such as LSST, where the rate of new discoveries is expected to render legacy approaches untenable.

The "StarHash" system addresses these issues by leveraging principles from geohashing in a manner tailored for astronomical requirements. It offers unique, highly memorable, and deterministic string identifiers for arbitrary sky locations, underpinned by strong algorithmic and statistical guarantees.

StarHash Methodology

The StarHash algorithm partitions the celestial sphere using a HEALPix grid at $N_\text{side} = 2^{16}$ , yielding pixels of $\sim3.2$ arcsec, closely aligned with established association radii. Each location's coordinate is mapped to a unique integer pixel index via HEALPix's angular-to-pixel (ang2pix) transformation. To decorrelate neighboring pixels in coordinate space from semantic proximity in identifier space, StarHash applies FF3 format-preserving encryption to the zero-padded HEALPix index. While FF3 lacks cryptographic robustness, its use here is solely for bijective scrambling rather than security, thus the weaknesses of FF3 are not relevant in this domain.

Word assignment proceeds through repeated modular decomposition of the encrypted index into base wordlist indices, constructing a k-tuple of words (with $k=3$ in the baseline implementation). The word list, drawn from the EFF long wordlist (with astronomy-specific augmentation), optimizes for memorability and avoids problematic homophony or profanity. This mapping ensures each 3.2 arcsec region has a unique triplet of words associated with it, and the process is fully reversible given public parameters.

Empirical Evaluation

Rigorous numerical assessment underscores StarHash's resistance to both coordinate-to-identifier and identifier-to-coordinate confusions.

On a sample of $\sim$ 50 million sky coordinate pairs, uniformly drawn, the distribution of Hamming distances between corresponding StarHash strings exhibits no residual spatial correlation. Even for minimal angular separations, identifier triples differ in two or all three words with high probability, sharply reducing the risk of confusion via transcription errors.

Figure 1: Box plot of Hamming distance over $\sim$ 50M unique coordinate pairs uniformly distributed across the sky versus on-sky distance.

This robust decorrelation holds in the regime of localized, dense sky sampling near clusters, as shown for a Gaussian distribution centered on M31. The use of FF3-encrypted indices as the basis for word selection is crucial to this property.

Figure 2: Box plot of Hamming distance for $\sim$ 50M coordinate pairs with Normal (0.5 deg) distribution centered on Andromeda, versus on-sky distance.

Further, analysis using Levenshtein distance (edit distance) confirms the minimal semantic proximity—single character errors in a StarHash phrase are exceedingly unlikely to yield nearby sky locations, providing resilience against manual entry mistakes.

Figure 3: Levenshtein edit distances for large coordinate pair samples, for both uniform all-sky and localized samples, as a function of angular separation.

The system is computationally efficient, achieving throughput of approximately 6600 hashes per second single-threaded in Python, allowing large-scale catalog processing (e.g., Gaia DR3) on commodity hardware.

Examples and Use Cases

Visual demonstration of StarHash's assignment confirms its practical utility. The StarHash grid overlays on HST imagery of SN 1987A exhibit grid correspondence with astrophysically meaningful regions, and the resulting phrase names are distinctive and unambiguous for human communication.

Figure 4: HST image of SN 1987A with overlaid StarHash grid and names, visualizing phrase assignment to sky patches.

A projection of StarHash phrase assignments for a curated set of notable transients further highlights the improved mnemonic and communicative properties compared to existing IAU or survey-based identifiers.

Figure 5: All-sky projection of StarHash names for significant transients, shown alongside conventional names; dashed line indicates the Galactic plane.

Limitations and Extensibility

Several caveats are inherent in the current StarHash scheme. Its Anglocentric word list impairs universal accessibility, motivating future language-localized variants at the potential cost of universality. The chosen HEALPix resolution is fit for most optical applications but is suboptimal for high-precision domains; this is straightforwardly addressable by increasing $k$ or the wordlist size, but at the expense of identifier length.

Temporal information is not encoded, contrasting with some transient-specific naming conventions. Extensions for encoding discovery epoch or additional parameters are possible via additional words, with scaling behaviors favorable for the foreseeable future. Integration into legacy systems and community buy-in are recognized as significant, though surmountable, barriers.

Implications and Outlook

StarHash is a reproducible, open, and production-ready system for astronomical object naming, blending algorithmic rigor with enhanced usability. It stands as a practical remedy to systemic confusion in astro-nomenclature—resolving error sensitivity, memorability, and cross-survey reproducibility, with demonstrable benefit over both coordinate-based identifiers and non-open, proprietary systems like What3words (2603.29584).

The framework is broadly extensible (including to non-optical/astrometric applications), well-suited to pipeline integration, and supports large-scale parallelization. Future work may refine language inclusivity, phrase space scaling, or integration with temporal and contextual metadata.

Conclusion

StarHash offers a robust, deterministic, and memorable alternative to traditional astronomical object naming systems. Its mathematical foundation ensures that phrase identifiers are unambiguous, easily communicable, resistant to minor entry errors, and decorrelated from coordinate proximity, as substantiated by extensive empirical validation. The system is open source and its adoption could harmonize object identification practices across astronomy, fostering both practical efficiency and reduced cognitive burden for researchers.

Markdown Report Issue