Projects Registry

The Projects Registry tracks notable open source projects at Stanford.

The breadth of the registry highlights the expertise of the research community, potential touchpoints for further discussion and collaboration, and the sheer amount of open source development happening on campus. It spans many disciplines that cover a wide range of experience levels and interests.

Currently, there are 76 entries.

If you would like to add your project to the registry or amend an existing project, please contact the Technical Community Manager (Francesca Vera) with relevant information.

Projects

The projects listed within this section make up the nominees for our Open Source Software Prize.

yt project

yt is an open source Python package for analyzing and visualizing volumetric data, and it has been applied in domains like astrophysics, seismology, nuclear engineering, molecular dynamics, and oceanography.

CDUGKS

Open source implementation of the Coupled Discrete Unified Gas Kinetic Scheme (CDUGKS) that is capable of handling a wide range of flow regimes. It has been used on well-known test problems from astrophysical fluid dynamics literature.

ARNIE

ARNIE is a Python package that is widely used to compute RNA energetics and for RNA structure prediction. The work has led to many peer-reviewed publications, including some with important applications to current RNA medicine (e.g. mRNA vaccine stabilization co-authored with Pfizer).

OpenCap

Scott Delp, Karen Liu

OpenCap is an open source platform that allows computing kinematics and dynamics of human movement using videos from two or more smartphones.

EternaFold

EternaFold enables RNA structure prediction through multitask learning on diverse crowdsourced data from the Eterna project. Its training tasks include 1) predicting single structures, 2) maximizing the likelihood of structure probing data, and 3) predicting experimentally-measured affinities of RNA molecules to proteins and small molecules.

Palladio

Dan Edelstein

Palladio is an open source toolbox that helps humanities researchers visualize complex historical data with ease.

dada2

Susan Holmes

dada2 is an R package for high-resolution sample inference from high-throughput amplicon sequencing data.

Causal Inference Methods

Guido Imbens

Open source software for causal inference methods, in particular for matching estimators and regression discontinuity, in Stata and MATLAB.

Dynamax

Scott Linderman

Dynamax is a library for probabilistic state space models (SSMs) written in JAX. It is a JAX-based version of SSM, resulting from a collaboration with Google researchers and developers.

NiPreps

Russell Poldrack

NiPreps (NeuroImaging PREProcessing toolS) is a community project that provides researchers with applications that allow them to prepare data for modeling and statistical analysis, and perform quality control smoothly.

FlashAttention

FlashAttention is an algorithm to speed up attention and reduce its memory footprint. It is now upstreamed into almost all popular libraries for AI (from PyTorch to OpenAI's system).

Hummingbird

Mike Snyder, Amir Bahmani

Hummingbird is a Python framework for predicting performance of computing instances with varying memory and CPU on multiple cloud platforms. It gives a variety of configurations to run genomics pipelines on cloud platforms.

Vistasoft

Brian Wandell

Vistasoft is a Matlab toolbox for analyzing Magnetic Resonance Imaging (MRI) data. Vistasoft tools span anotomical, functional, and diffusion analyses. It also includes some sample data.

torch-choice

Susan Athey

torch-choice is an open source PyTorch package for flexible, fast, large-scale choice modeling with Python. It is a toolkit that can be used for modeling consumer behaviors.

ZMap

Zakir Durumeric

The ZMap Project is a collection of open source tools for performing large-scale studies of the hosts and services that compose the public Internet.

Stanford Spezi

Oliver Aalami, Paul Schmiedmayer, Vishnu Ravi

Stanford Spezi is an open source framework that is designed for digital health applications and makes it easy to build an app with modules that include questionnaires, data collection from wearable devices, and integration with electronic record systems.

InVEST

Natural Capital Project

InVEST is a suite of open source software models used to map and value the goods and services from nature that sustain human life. Governments, non-profits, international lending institutions, and corporations all manage natural resources for multiple uses and evaluate tradeoffs among them. The multi-service, modular design of InVEST provides an effective tool for balancing the environmental and economic goals of these entities.

PyGeoprocessing

Natural Capital Project

PyGeoprocessing is a Python/Cython-based library that provides a set of commonly used raster, vector, and hydrological operations for GIS processing. PyGeoprocessing is developed at the Natural Capital Project to create a programmable, open source, and free Python based GIS processing library to support the InVEST toolset.

Protégé

Protégé is an editor for ontologies and knowledge graphs. It is the most widely used technology for creating ontologies in the world, with hundreds of thousands of downloads. The system has been developed in the form of both a desktop application and a Web-based system. Protégé is being used to develop and maintain major ontologies, including the International Classification of Diseases (ICD), the NCI Thesaurus, the Gene Ontology, and many others.

BioPortal

BioPortal is an online repository of all publicly available biomedical ontologies and controlled terminologies in the world, with more than 1000 resources. BioPortal allows users to search for ontologies, to browse them, and to review information about their provenance and use. BioPortal content is available through a Web-based user interface and an API. Additional components determine how terms in one ontology map to terms in other ontologies in the system, and which ontologies might be most useful for describing the content in particular situations. The BioPortal software forms the basis of a system known as OntoPortal, which is used by an international community of investigators to host ontologies in different scientific disciplines.

Labs & Groups

Stanford Trustworthy AI Research (STAIR)

Sanmi Koyejo

The Stanford Trustworthy AI Research group develops principles and practice of trustworthy machine learning, including robust federated machine learning and metric elicitation, and selecting more effective machine learning metrics via human interaction. Some research applications are neuroimaging, healthcare, and biomedical imaging.

Lobell Lab

David Lobell

The Lobell Lab uses a range of modern tools to study the interactions between food production, food security, and the environment.

Hazy Research

Hazy Research is a Computer Science research group interested in building foundations for the next generation of machine learning systems. The group produces open source projects that are widely used in AI and data science.

CodeX - Stanford Center for Legal Informatics

Roland Vogl

CodeX, the Stanford Center for Legal Informatics, emphasizes the research and development of computational law to bring legal efficiency, transparency, and access to legal systems around the world. The CodeX Insurance Initiative brings the technology of computable contracts to fundamental problems in the insurance ecosystem.

Stanford Biodesign Digital Health Group

Oliver Aalami, Paul Schmiedmayer, Vishnu Ravi

The Stanford Biodesign Digital Health Group advances digital health research and applications by fostering an accessible health ecosystem. The group develops, implements, and investigates digital health solutions that improve health journeys, including its flagship project, Stanford Spezi.

Natural Capital Project

Gretchen Daily

The Natural Capital Project (NatCap) aims to improve the well-being of all by motivating greater investment in natural capital. It was co-created by numerous Stanford students, senior researchers, software engineers, and faculty, as well as a large global research community. NatCap's tools have been deployed in virtually all (>185) countries.

Data

While we recognize that there are code aspects to these projects, the following are primarily open data projects.

MetaLab

Michael Frank

MetaLab is a database that contains almost 3,000 effect sizes across early language and cognitive development domains, based on data from 747 papers and 48,529 subjects.

Vascular Model Repository

Alison Marsden

The Vascular Model Repository is an open source database of normal and diseased cardiovascular models available for academic, government, and industry researchers. The models can be used to verify computational methods for fluid and solid mechanics. It received a DataWorks Achievement Award in 2023.

BLASTNet

Matthias Ihme

BLASTNet (Bearable Large Accessible Scientific Training Network-of-Datasets) is a community-hosted web-platform that aims to address gaps in open machine learning, specifically fluid mechanics, by providing researchers with open source resources. The data is useful for fluid flows in a variety of machine learning applications tied to energy and the environment.

METER-ML

Andrew Ng, Robert Jackson

METER-ML is a multi-sensor dataset containing georeferenced NAIP, Sentinel-1, and Sentinel-2 images in the U.S. labeled for the presence/absence of methane source facilities. It includes a baseline model for the classification of methane source type (concentrated animal feeding operations, coal mines, landfills, natural gas processing plants, oil refineries and petroleum terminals, and wastewater treatment plants).

OpenNeuro

Russell Poldrack

OpenNeuro is an open source platform for the archiving and sharing of neuroimaging data. The platform validates and shares Brain Imaging Data Structure (BIDS) compliant MRI, PET, MEG, EEG, and iEEG data.

Platforms & Utilities

Open Graph Benchmark (OGB)

Jure Leskovec

The Open Graph Benchmark (OGB) provides datasets, data loaders, and evaluators for benchmarking graph machine learning methods. It has become the gold standard for research in machine learning for graphs.

CEDAR

CEDAR is technology that helps scientists to encode in machine-readable form community standards for metadata to describe their datasets; to store and manage these metadata descriptions in a library of metadata templates; and to use the templates to author new metadata in a manner that ensures adherence to the community standards. As a result, CEDAR ensures that the metadata that annotate data are adherent to community standards, and that the datasets are findable, accessible, interoperable, and reusable (FAIR). CEDAR has been incorporated into systems that manage scientific data such as the Open Science Framework (OSF) and the Dryad data repository, and it is an important technology in support of open science.

Education