AstroLLM

A domain-specialized language model for astronomy and astrophysics

15M+ papers · 20.5M objects · 5,700+ exoplanets

An open intelligence layer for astronomy — retrieval-grounded, tool-integrated, and built in the open.

Not a benchmark model.
A knowledge system.

AstroLLM is building an open intelligence layer for astronomy — connecting the world's astronomical literature, databases, and observations through specialized language models that can retrieve, reason, cite, and teach.

Existing astronomy LLMs optimize for benchmark scores. AstroLLM optimizes for the workflows researchers actually use: finding papers, resolving objects, explaining evidence, and teaching at the right level.

  • Retrieval-grounded — Every answer cites real papers from NASA ADS
  • Tool-integrated — Queries SIMBAD, Exoplanet Archive, and astronomical databases live
  • Openly built — Models, data, evaluation, and training pipeline are all open source

Retrieve. Reason. Cite.

  1. Query

    "What do we know about TRAPPIST-1e's atmosphere?"

  2. Retrieve

    Search 15M+ papers via NASA ADS

    Resolve objects via SIMBAD (20.5M astronomical objects)

    Cross-reference with NASA Exoplanet Archive

  3. Reason

    Fine-tuned astronomy model interprets evidence

    Adapts explanation depth to your level

  4. Cite

    Every claim linked to source papers

    ADS bibcodes trace back to the literature

From phone to cluster

Nano

1-3B

Runs everywhere

2-4 GB RAM

Phone, laptop, RPi

Core

Building now
7-8B

The sweet spot

4-8 GB VRAM

Mac M2+, RTX 3060+

Pro

14-32B

Personal cloud power

10-24 GB VRAM

A100, RTX 4090

Ultra

70B+

Institutional grade

40-80+ GB VRAM

Multi-GPU cluster

Built on the open astronomy ecosystem

  • Built on
  • NASA ADS
  • SIMBAD
  • NASA Exoplanet Archive
  • NED
  • PDS
  • Gaia
  • NASA ADS 15M+ publications, citation graphs, co-readership
  • SIMBAD 20.5M astronomical objects, cross-identifications
  • Exoplanet Archive 5,700+ confirmed planets with full parameters
  • NED Extragalactic objects, galaxies, quasars, AGN
  • PDS NASA planetary mission data archives
  • Gaia 1.7B stars with positions and distances

All data sources are free, open access, and publicly funded. AstroLLM stands on the shoulders of decades of open astronomical infrastructure.

Building in the open

Phase 1 Now

Retrieval-grounded copilot

ADS + SIMBAD + citations + first fine-tune

Phase 2

Expanded tools + serious model

NED, PDS, Gaia integration + DPO alignment

Phase 3

Scientific tool ecosystem

Model family + continuous learning + API

Phase 4+

Multimodal knowledge house

Spectra, images, light curves + AION-1 bridge

Follow the build

AstroLLM is built in the open. Models, training data, evaluation benchmarks, and the complete training pipeline will be published as they're developed.