When CENIC was founded in the 1990s, even the creation of a network backbone like CalREN was revolutionary, to say nothing of the decision made by the state’s research and education community to build its own.
Nowadays, such networks are located all over the globe, and the focus has shifted to the member services provided over them, including network-enabled services and resources facilitated by networking organizations and created by their member communities. One of the most powerful such examples of a distributed member-created compute and storage resource is the CENIC AI Resource or CENIC AIR.
This trend has evolved to include not only the compute and storage resources—CPUs, GPUs, and data storage devices—but the data itself and the services to make use of it in the form of the National Data Platform (NDP), a federated and extensible data ecosystem designed to promote secure collaboration and innovation on top of existing data and cyberinfrastructure, including the National Research Platform (NRP) and CENIC AIR.
Thanks to its team of 38 experts in various fields of research and project administration, headed by San Diego Supercomputer Center (SDSC) Chief Data Science Officer İlkay Altıntaş, the NDP makes vast databases available in a growing number of meteorological, geographic, and geologic sciences, as well as the services needed to make use of them.
The datasets are stored and processed via the NDP Federation, a scalable platform accessed via Points of Presence (PoPs) where users can host services that add value to these data. In addition to the datasets themselves, the NDP also provides the tools and services to access and analyze them, as well as to define workflows, participants, and needed services via the Jupyter notebook-accessible NDP Hub.
Of course, finding just the right datasets for a research subject among the NDP’s large catalog would be challenging without the NDP’s extremely flexible search capabilities. Metadata such as time, topic, correlation, and location (using placenames or boundary polygons) are all searchable—individually or in combination—using the popular Apache Lucene search syntax.
Datasets can also be accessed in a variety of ways depending on the user’s preferences. They can be staged, streamed, and even filtered so that the entire dataset need not be downloaded for processing. The NDP also features the ability to import datasets into CENIC AIR via widgets, where they can then be processed using CENIC AIR/NRP’s own Jupyter resources. (This includes not only databases made available through CENIC AIR/NRP members but by any of the 42 institutions participating in the NDP.) Even synthetic datasets can be generated via AI, where the researcher deems appropriate.
In 2010, Lt. Gen. David A. Deptula, Air Force deputy chief of staff for intelligence, surveillance and reconnaissance prophetically stated, “We're going to find ourselves in the not too distant future swimming in sensors and drowning in data.” Happily, thanks to data scientists like SDSC’s İlkay Altıntaş and the team behind the NDP, researchers can now explore that vast ocean of data without being overwhelmed by it.
If you’d like to learn more about the NDP, you can watch the presentation by the University of Utah’s Manish Parashar from January 2025 at the 6NRP Workshop and download his slides or visit the National Data Platform.
To learn how your institution can use and participate in CENIC AIR, please contact the NOC at noc@cenic.org or the CENIC Project Management Office.