Skip to main content Skip to secondary navigation
The Dish in the Stanford foothills at dusk

Projects Registry

Main content start

The Projects Registry tracks notable open source projects at Stanford. 

The breadth of the registry highlights the expertise of the research community, potential touchpoints for further discussion and collaboration, and the sheer amount of open source development happening on campus. It spans many disciplines that cover a wide range of experience levels and interests.

Currently, there are 76 entries.

If you would like to add your project to the registry or amend an existing project, please contact the Technical Community Manager (Francesca Vera) with relevant information.

Projects

The projects listed within this section make up the nominees for our Open Source Software Prize.

The Enzo Project

Tom Abel

Enzo is an adaptive mesh refinement simulation code. Community-developed, it is designed for multi-physics astrophysical calculations and is freely available, having been featured in numerous journal articles.

View Project

yt project

Tom Abel

yt is an open source Python package for analyzing and visualizing volumetric data, and it has been applied in domains like astrophysics, seismology, nuclear engineering, molecular dynamics, and oceanography.

View Project

Multi-Scale-Initial-Conditions (MUSIC)

Tom Abel

Multi-Scale-Initial-Conditions (MUSIC) is a computer program that generates nested grid initial conditions for high-resolution cosmological simulations.

View Project

Dedalus

Tom Abel

Dedalus is an open source flexible framework written in Python for spectrally solving differential equations. It is developed and used to study fluid dynamics.

View Project

CDUGKS

Tom Abel

Open source implementation of the Coupled Discrete Unified Gas Kinetic Scheme (CDUGKS) that is capable of handling a wide range of flow regimes. It has been used on well-known test problems from astrophysical fluid dynamics literature.

View Project

cvc5

Clark Barrett, Cesare Tinelli

cvc5 is an open source tool for proving Satisfiability Modulo Theories (SMT) problems, used to determine the satisfiability of first-order formulas with respect to combinations of useful background theories. It is written entirely in C++ and is the fifth in the Cooperating Validity Checker family of tools.

View Project

Mobile ALOHA

Chelsea Finn

Mobile ALOHA is a low-cost open source hardware system that imitates mobile manipulation tasks that are bimanual and require whole-body control. It builds on top of the ALOHA system with a mobile base and a whole-body teleoperation interface.

View Project

ARNIE

Rhiju Das

ARNIE is a Python package that is widely used to compute RNA energetics and for RNA structure prediction. The work has led to many peer-reviewed publications, including some with important applications to current RNA medicine (e.g. mRNA vaccine stabilization co-authored with Pfizer).

View Project

Eterna

Rhiju Das

Eterna is an open science platform that allows users to solve puzzles using RNAs and advance medical research. It has reached roughly 300,000 registered participants through its video game interface.

View Project

OpenSim

Scott Delp, Karen Liu

OpenSim is an open source software project for modeling and simulation of the musculoskeletal that has been downloaded by nearly 80,000 individuals. It is used in research and teaching, and it has a worldwide developer community of contributors.

View Project

OpenCap

Scott Delp, Karen Liu

OpenCap is an open source platform that allows computing kinematics and dynamics of human movement using videos from two or more smartphones.

View Project

AddBiomechanics

Scott Delp, Karen Liu

AddBiomechanics is a method to automate and standardize analyzing human movement dynamics from motion capture data, making it easier and faster to generate high-quality biomechanics data. It aims to create a repository of biomechanics data that is open to the research community.

View Project

EternaFold

Rhiju Das

EternaFold enables RNA structure prediction through multitask learning on diverse crowdsourced data from the Eterna project. Its training tasks include 1) predicting single structures, 2) maximizing the likelihood of structure probing data, and 3) predicting experimentally-measured affinities of RNA molecules to proteins and small molecules.

View Project

Palladio

Dan Edelstein

Palladio is an open source toolbox that helps humanities researchers visualize complex historical data with ease.

View Project

Alpaca

Tatsunori Hashimoto

The Stanford Alpaca project presents an instruction-following language model intended for academic research, dubbed Alpaca, which is fine-tuned from Meta’s LLaMA model.

View Project

phyloseq

Susan Holmes

phylosec is an R package that provides classes and tools to facilitate the import, storage, analysis, and graphical display of microbiome census data.

View Project

dada2

Susan Holmes

dada2 is an R package for high-resolution sample inference from high-throughput amplicon sequencing data. 

View Project

CytoGLMM

Susan Holmes

CytoGLMM is an R package that implements two multiple regression strategies: a bootstrapped generalized linear model and a generalized linear mixed model. Its narrower focus on just one cell type allows more specific statistical modeling with easier control of statistical guarantees.

View Project

decontam

Susan Holmes

decontam is an R package that provides simple statistical identification of contaminating sequence features in marker-gene or metagenomics data.

View Project

Causal Inference Methods

Guido Imbens

Open source software for causal inference methods, in particular for matching estimators and regression discontinuity, in Stata and MATLAB.

View Project

PyG

Jure Leskovec

PyG is a library built on PyTorch that easily writes and trains Graph Neural Networks. It is one of the most popular libraries for machine learning on graphs (over 100,000 downloads per month and over 300 contributors) and is friendly to both researchers and first-time users.

View Project

Stanford Network Analysis Platform (SNAP)

Jure Leskovec

The Stanford Network Analysis Platform (SNAP) is a general-purpose network analysis and graph mining library that scales to massive networks, available in C++ and Python. It is downloaded over 35,000 times per year and its related datasets are accessed over 800,000 times per year.

View Project

Dynamax

Scott Linderman

Dynamax is a library for probabilistic state space models (SSMs) written in JAX. It is a JAX-based version of SSM, resulting from a collaboration with Google researchers and developers.

View Project

Simvascular

Alison Marsden

Simvascular is the only fully open-source software package providing a complete pipeline from medical image data segmentation to patient specific blood flow simulation and analysis.

View Project

Homa Transport Protocol

John Ousterhout

The Homa Transport Protocol is a new network transport protocol for modern datacenters. It is a redesign of network transport that is 10-100x faster than TCP for a variety of significant use cases.

View Project

NiPreps

Russell Poldrack

NiPreps (NeuroImaging PREProcessing toolS) is a community project that provides researchers with applications that allow them to prepare data for modeling and statistical analysis, and perform quality control smoothly.

View Project

Metabias

Maya Mathur

Metabias provides software (R packages and websites) that implements novel statistical methods for meta-analysis and causal inference studies of bias within-study and across-study. As of 2020, the R packages had been downloaded >33,000 times.

View Project

Glmnet

Trevor Hastie

Glmnet is a package for R that provides extremely efficient procedures for fitting the entire lasso or elastic-net regularization path for a range of linear modeling approaches.

View Project

FlashAttention

Chris Ré

FlashAttention is an algorithm to speed up attention and reduce its memory footprint. It is now upstreamed into almost all popular libraries for AI (from PyTorch to OpenAI's system).

View Project

Trove

Nigam Shah

Trove is a research framework for building weakly supervised (bio)medical named entity recognition and other entity attribute classifiers without hand-labeled training data. It has been used as part of several COVID-19 research efforts at Stanford.

View Project

Swarm

Mike Snyder, Amir Bahmani

Swarm is a framework for federated computation that promotes minimal data motion and facilitates crosstalk between genomic datasets stored on various cloud platforms. It has reduced computational costs, run-time delays, and risks of security breach and privacy violation.

View Project

Hummingbird

Mike Snyder, Amir Bahmani

Hummingbird is a Python framework for predicting performance of computing instances with varying memory and CPU on multiple cloud platforms. It gives a variety of configurations to run genomics pipelines on cloud platforms.

View Project

Image System Engineering Toolbox for Biology (ISETBio)

Brian Wandell

The Image System Engineering Toolbox for Biology (ISETBio) supports vision science calculations and provides tools for modeling image systems engineering in the human visual system.

View Project

Image Systems Engineering Toolbox for Cameras (ISETCam)

Brian Wandell

The Image Systems Engineering Toolbox for Cameras (ISETCam) is a Matlab toolbox that designs and evaluates imaging systems. The ISETCam repository includes code that pertains to sensor and display models.

View Project

Vistasoft

Brian Wandell

Vistasoft is a Matlab toolbox for analyzing Magnetic Resonance Imaging (MRI) data. Vistasoft tools span anotomical, functional, and diffusion analyses. It also includes some sample data.

View Project

generalized random forests (grf)

Susan Athey, Stefan Wager

generalized random forests (grf) is a library in R for estimating causal effects with modern non-parametric and machine learning methods.

View Project

Bayesian Embedding (BEMB)

Susan Athey

Bayesian Embedding (BEMB) is a flexible, fast Bayesian embedding model used for modeling choice problems. The BEMB package is built upon the torch-choice library, which allows for large-schale choice modeling with Python.

View Project

torch-choice

Susan Athey

torch-choice is an open source PyTorch package for flexible, fast, large-scale choice modeling with Python. It is a toolkit that can be used for modeling consumer behaviors.

View Project

Meerkat

Karan Goel, Sabri Eyuboglu, Arjun Desai

Meerkat is an open source Python library that helps users visualize, explore, and annotate any dataset, especially when processing unstructured data types (e.g. free text, PDFs, images, video) with machine learning models.

View Project

Retina

Zakir Durumeric

Retina is a network analysis framework that supports operators and researchers to conduct 100+ Gbps traffic analysis on a single server with no specialized hardware.

View Project

ZMap

Zakir Durumeric

The ZMap Project is a collection of open source tools for performing large-scale studies of the hosts and services that compose the public Internet.

View Project

SimPEG

Seogi Kang

SimPEG provides geophysical simulation and inversion tools built in a consistent framework to contribute to a growing community of geoscientists that is building an open foundation for geophysics.

View Project

HealthGPT

Oliver Aalami, Paul Schmiedmayer, Vishnu Ravi

HealthGPT is an experimental iOS app that allows users to interact with their health data stored in the Apple Health app using a chat-style interface. It offers those looking to make LLM-powered apps within the Apple Health ecosystem an easy-to-extend solution.

View Project

Stanford Spezi

Oliver Aalami, Paul Schmiedmayer, Vishnu Ravi

Stanford Spezi is an open source framework that is designed for digital health applications and makes it easy to build an app with modules that include questionnaires, data collection from wearable devices, and integration with electronic record systems.

View Project

LLM on FHIR

Oliver Aalami, Paul Schmiedmayer, Vishnu Ravi

The LLM on FHIR application demonstrates the power of LLMs to explain and provide helpful context around patient data that is provided in the FHIR format. LLM on FHIR supports multiple languages (English, Spanish, Chinese, German, and French) and converses with users bsed on their system language.

View Project

InVEST

Natural Capital Project

InVEST is a suite of open source software models used to map and value the goods and services from nature that sustain human life. Governments, non-profits, international lending institutions, and corporations all manage natural resources for multiple uses and evaluate tradeoffs among them. The multi-service, modular design of InVEST provides an effective tool for balancing the environmental and economic goals of these entities.

View Project

PyGeoprocessing

Natural Capital Project

PyGeoprocessing is a Python/Cython-based library that provides a set of commonly used raster, vector, and hydrological operations for GIS processing. PyGeoprocessing is developed at the Natural Capital Project to create a programmable, open source, and free Python based GIS processing library to support the InVEST toolset.

View Project

ViLLM

Sanmi Koyejo, Sang Truong

ViLLM is an end-to-end framework for fine-tuning, evaluating, and deploying Vietnamese large language models. It is designed to be a comprehensive toolkit for researchers and practitioners working with large language models in Vietnamese.

View Project

Protégé

Mark Musen

Protégé is an editor for ontologies and knowledge graphs.  It is the most widely used technology for creating ontologies in the world, with hundreds of thousands of downloads.  The system has been developed in the form of both a desktop application and a Web-based system.  Protégé is being used to develop and maintain major ontologies, including the International Classification of Diseases (ICD), the NCI Thesaurus, the Gene Ontology, and many others.

View Project

BioPortal

Mark Musen

BioPortal is an online repository of all publicly available biomedical ontologies and controlled terminologies in the world, with more than 1000 resources.  BioPortal allows users to search for ontologies, to browse them, and to review information about their provenance and use.  BioPortal content is available through a Web-based user interface and an API.  Additional components determine how terms in one ontology map to terms in other ontologies in the system, and which ontologies might be most useful for describing the content in particular situations.  The BioPortal software forms the basis of a system known as OntoPortal, which is used by an international community of investigators to host ontologies in different scientific disciplines.

View Project

LiCoRICE

Paul Nuyujukian

LiCoRICE is an open-source computational platform implementing model-based design that performs realtime processing of data. It facilitates the execution of numerical operations written in Python with empirical realtime guarantees.

View Project

Labs & Groups

Stanford Neuromuscular Biomechanics Lab

Scott Delp, Karen Liu

The Stanford Neuromuscular Biomechanics Lab shares machine learning and biomechanical modeling and analysis tools from the lab group's research.

View Project

Stanford Trustworthy AI Research (STAIR)

Sanmi Koyejo

The Stanford Trustworthy AI Research group develops principles and practice of trustworthy machine learning, including robust federated machine learning and metric elicitation, and selecting more effective machine learning metrics via human interaction. Some research applications are neuroimaging, healthcare, and biomedical imaging.

View Project

Linderman Lab

Scott Linderman

The Linderman Lab works at the intersection of machine learning and computaitonal neuroscience, developing a variety of software for modeling neural and behavioral time series. The most used software developed by the lab is SSM, a Python package for probabilistic state space modeling. 

View Project

Sustainability and Artificial Intelligence Lab

Marshall Burke, Stefano Ermon, David Lobell

The Sustainability and Artificial Intelligence Lab is an interdisciplinary group that uses machine learning, remote sensing, and survey data to help make the world wealthier, healthier, and greener.

View Project

Lobell Lab

David Lobell

The Lobell Lab uses a range of modern tools to study the interactions between food production, food security, and the environment.

View Project

Stanford Natural Language Processing (NLP) Group

Chris Manning

The Stanford Natural Language Processing (NLP) Group provides statistical NLP, deep learning NLP, and rule-based NLP tools used for solving major computational linguistics problems. Stanford NLP Group packages are widely used in industry, academia, and government.

View Project

Poldrack Lab

Russell Poldrack

The Poldrack Lab develops tools for processing and analysis of brain imaging data. The lab's research code is openly available and (unless otherwise noted) is released under the unrestrictive MIT License.

View Project

Hazy Research

Chris Ré

Hazy Research is a Computer Science research group interested in building foundations for the next generation of machine learning systems. The group produces open source projects that are widely used in AI and data science.

View Project

Stanford Radio Glaciology

Dustin Schroeder

The Stanford Radio Glaciology research group focuses on advancing the foundations of geophysical ice penetrating radar and its applications in observing and understanding the interaction of ice and water in the solar system.

View Project

CodeX - Stanford Center for Legal Informatics

Roland Vogl

CodeX, the Stanford Center for Legal Informatics, emphasizes the research and development of computational law to bring legal efficiency, transparency, and access to legal systems around the world. The CodeX Insurance Initiative brings the technology of computable contracts to fundamental problems in the insurance ecosystem.

View Project

Stanford Biodesign Digital Health Group

Oliver Aalami, Paul Schmiedmayer, Vishnu Ravi

The Stanford Biodesign Digital Health Group advances digital health research and applications by fostering an accessible health ecosystem. The group develops, implements, and investigates digital health solutions that improve health journeys, including its flagship project, Stanford Spezi.

View Project

Natural Capital Project

Gretchen Daily

The Natural Capital Project (NatCap) aims to improve the well-being of all by motivating greater investment in natural capital. It was co-created by numerous Stanford students, senior researchers, software engineers, and faculty, as well as a large global research community. NatCap's tools have been deployed in virtually all (>185) countries.

View Project

Data

While we recognize that there are code aspects to these projects, the following are primarily open data projects.

Wordbank

Michael Frank

Wordbank is an open database of children's vocabular development, containing data from 84,139 children and 95,745 CDI administrators. It spans 38 languages and 78 instruments.

View Project

MetaLab

Michael Frank

MetaLab is a database that contains almost 3,000 effect sizes across early language and cognitive development domains, based on data from 747 papers and 48,529 subjects.

View Project

Peekbank

Michael Frank

Peekbank is an open database that stores eye-tracking datasets on children's word recognition in an easily accessible format, and the project provides interfaces for accessing the database and visualizing the data. It also provides processing tools for standardizing eye-tracking data.

View Project

childes-db

Michael Frank

childes-db is a database that stores child language datasets from CHILDES in an easily accessible format. It provides a versioning system for corpora and tools that help promote reproducible research.

View Project

Vascular Model Repository

Alison Marsden

The Vascular Model Repository is an open source database of normal and diseased cardiovascular models available for academic, government, and industry researchers. The models can be used to verify computational methods for fluid and solid mechanics. It received a DataWorks Achievement Award in 2023.

View Project

BLASTNet

Matthias Ihme

BLASTNet (Bearable Large Accessible Scientific Training Network-of-Datasets) is a community-hosted web-platform that aims to address gaps in open machine learning, specifically fluid mechanics, by providing researchers with open source resources. The data is useful for fluid flows in a variety of machine learning applications tied to energy and the environment.

View Project

METER-ML

Andrew Ng, Robert Jackson

METER-ML is a multi-sensor dataset containing georeferenced NAIP, Sentinel-1, and Sentinel-2 images in the U.S. labeled for the presence/absence of methane source facilities. It includes a baseline model for the classification of methane source type (concentrated animal feeding operations, coal mines, landfills, natural gas processing plants, oil refineries and petroleum terminals, and wastewater treatment plants).

View Project

OpenNeuro

Russell Poldrack

OpenNeuro is an open source platform for the archiving and sharing of neuroimaging data. The platform validates and shares Brain Imaging Data Structure (BIDS) compliant MRI, PET, MEG, EEG, and iEEG data.

View Project

Platforms & Utilities

SimTK

Scott Delp, Karen Liu

SimTK is a project-hosting platform that enables members of the biomedical computation community to share software, data, and models. It hosts OpenSim and over 1,600 other projects.

View Project

Open Graph Benchmark (OGB)

Jure Leskovec

The Open Graph Benchmark (OGB) provides datasets, data loaders, and evaluators for benchmarking graph machine learning methods. It has become the gold standard for research in machine learning for graphs.

View Project

Brain Imaging Data Structure (BIDS) Validator

Russell Poldrack

The Brain Imaging Data Structure (BIDS) Validator is a validation tool used for the ingestion of data into OpenNeuro.

View Project

CEDAR

Mark Musen

CEDAR is technology that helps scientists to encode in machine-readable form community standards for metadata to describe their datasets; to store and manage these metadata descriptions in a library of metadata templates; and to use the templates to author new metadata in a manner that ensures adherence to the community standards.  As a result, CEDAR ensures that the metadata that annotate data are adherent to community standards, and that the datasets are findable, accessible, interoperable, and reusable (FAIR).  CEDAR has been incorporated into systems that manage scientific data such as the Open Science Framework (OSF) and the Dryad data repository, and it is an important technology in support of open science.

View Project

Education

Modern Statistics for Modern Biology

Susan Holmes

Modern Statistics for Modern Biology is an open source textbook that is continually maintained and updated online. The print version of the book was published in 2019.

View Project