AstroLLM
A domain-specialized language model for astronomy and astrophysics
An open intelligence layer for astronomy — retrieval-grounded, tool-integrated, and built in the open.
Not a benchmark model.
A knowledge system.
AstroLLM is building an open intelligence layer for astronomy — connecting the world's astronomical literature, databases, and observations through specialized language models that can retrieve, reason, cite, and teach.
Existing astronomy LLMs optimize for benchmark scores. AstroLLM optimizes for the workflows researchers actually use: finding papers, resolving objects, explaining evidence, and teaching at the right level.
- Retrieval-grounded — Every answer cites real papers from NASA ADS
- Tool-integrated — Queries SIMBAD, Exoplanet Archive, and astronomical databases live
- Openly built — Models, data, evaluation, and training pipeline are all open source
Retrieve. Reason. Cite.
- Query
"What do we know about TRAPPIST-1e's atmosphere?"
- Retrieve
Search 15M+ papers via NASA ADS
Resolve objects via SIMBAD (20.5M astronomical objects)
Cross-reference with NASA Exoplanet Archive
- Reason
Fine-tuned astronomy model interprets evidence
Adapts explanation depth to your level
- Cite
Every claim linked to source papers
ADS bibcodes trace back to the literature
From phone to cluster
Nano
Runs everywhere
2-4 GB RAM
Phone, laptop, RPi
Core
Building nowThe sweet spot
4-8 GB VRAM
Mac M2+, RTX 3060+
Pro
Personal cloud power
10-24 GB VRAM
A100, RTX 4090
Ultra
Institutional grade
40-80+ GB VRAM
Multi-GPU cluster
Built on the open astronomy ecosystem
- NASA ADS 15M+ publications, citation graphs, co-readership
- SIMBAD 20.5M astronomical objects, cross-identifications
- Exoplanet Archive 5,700+ confirmed planets with full parameters
- NED Extragalactic objects, galaxies, quasars, AGN
- PDS NASA planetary mission data archives
- Gaia 1.7B stars with positions and distances
All data sources are free, open access, and publicly funded. AstroLLM stands on the shoulders of decades of open astronomical infrastructure.
Building in the open
Retrieval-grounded copilot
ADS + SIMBAD + citations + first fine-tune
Expanded tools + serious model
NED, PDS, Gaia integration + DPO alignment
Scientific tool ecosystem
Model family + continuous learning + API
Multimodal knowledge house
Spectra, images, light curves + AION-1 bridge
Follow the build
AstroLLM is built in the open. Models, training data, evaluation benchmarks, and the complete training pipeline will be published as they're developed.