Radar Operations Center (ROC) in a Box! Chameleon as an On-Demand Multi-Layed Operations Center

What is the central challenge/hypothesis your experiment investigates?

The EdgeVPN network

Founded at the Univ. of Massachusetts Amherst in 2003, the NSF Engineering Research Center for Collaborative Adaptive Sensing of the Atmosphere (CASA) has deployed and operates a real time Doppler weather radar network in the Dallas/Fort Worth metroplex.  Processing the rapidly updating, volumetric sensor data into low-latency decision support products requires substantial computing and networking resources, and complex scientific workflows.  Asynchronous observations must be blended together, meteorological products derived, images generated for display, customized alerts sent for impactful weather, and data archived for climatological research and new product development.   To complicate matters, compute load and network traffic is typically maximized during severe weather events, precisely when the data is most important to users, including the National Weather Service, emergency management, transportation officials, and the general public.  The dedicated CASA computing has to be capable of handling these high intensity periods, but remains largely idle on clear and dry days.  As such it is a good candidate for an elastic solution.

In recent years CASA joined with the Univ. of North Carolina Chapel Hill Renaissance Computing Institute (RENCI) and the Univ. of Southern California Information Sciences Institute (ISI) under the NSF Funded Dynamo* and FlyNet** programs to facilitate researcher use of academic compute clouds and to efficiently streamline processing sequences and data handling.  Workflows associated with the CASA network serve as exemplary case studies for other researchers to use as a template.  Data generated by these workflows also provides real benefit to the real time system operations and to the 50+ cities and towns that subscribe for access.  Our research examines how best to schedule and deploy computing on the edge-to-cloud continuum, matching the capabilities of the heterogeneous and evolving hardware with the software requirements, always with an eye toward generalizable solutions for the larger scientific community.

How is your research addressing this challenge?

The Dynamo team developed tools and strategies for dynamic provisioning, task scheduling, and system monitoring, to demonstrate and simplify multi-tiered processing solutions across several academic clouds, including Chameleon, Mass Open Cloud, and formerly ExoGENI.

The EdgeVPN network

The provisioning aspect is both with respect to computing and networking.  To address this, the team developed Mobius (https://github.com/RENCI-NRIG/Mobius), an open-source command line tool and python library designed to simplify the acquisition of cloud resources, including software deployment upon instance creation, layer 2 stitching via AL2S, internal LAN setup between cloud nodes, virtual SDX for bandwidth shaping, and hooks into Prometheus with Grafana front end interface for resource monitoring.  Mobius unifies cloud APIs making the platform transparent to the researcher.

The EdgeVPN network

Task scheduling can be a particularly difficult problem with complicated workflows and a dynamic resource profile underneath.  For this, the team adapted existing static product generation scripts into the Pegasus Workflow Management System (https://pegasus.isi.edu/) for orchestration and execution across an arbitrary pool of available worker nodes with HT Condor (https://htcondor.org/).   This allows the processing system to scale as the coverage area of storms and resultant compute load increases, without any manual changes needed by the operator, which are prone to error.

How do you structure your experiment on Chameleon?

Matching hardware to software is an important consideration and Chameleon’s powerful bare metal servers are ideal for resource intensive processes like nowcasting and hail identification.  This output often then serves as the basis for lightweight downstream applications such as image generation and GIS comparisons, which are suited for highly scalable pools of VMs.  The ability of Mobius to create and configure layer2 networks between clouds allows for seamless data transfer between the hybrid platforms to best utilize their individual features.   

The EdgeVPN network

Recently, when the data center that hosts CASA’s dedicated Radar Operations Center (ROC) went fully offline for 3 days of maintenance, our experiments were put to the test in real time.  CHI@UC baremetal servers were reserved in advance of the outage, and when the time came the system was provisioned with Mobius, a ROC-in-a-box, preconfigured to receive a live stream of radar data on a high speed Internet 2 connection from our data portal at the University of North Texas.  Libraries were installed for locally running software modules, and Singularity containers were downloaded with relevant product executables ready to go.  As radar data arrived it was sent as input into Python scripts to create workflows within the Pegasus WMS for scheduling and data handling.  Almost immediately the system was generating live Nowcasting data out to 30 minutes into the future and hail detections as the basis for alerting more than 3000 emergency management users of our website and our Iphone and Android apps.  To no surprise, severe weather occurred during the full first day of the data center shutdown, and Chameleon was of immense help to continue operations during the planned outage.  The ease of provisioning along with the well scripted workflow configurations allow Chameleon to be used as a standing backup for the UMass ROC.

The FlyNet project looks specifically at how best to stage processing, extending the Dynamo concepts all the way out to small edge devices such as those provided by CHI@Edge.  We’ve joined with the University of Missouri to introduce new use cases with respect to small, low flying aircraft, from consumer drones to air taxis.  These air vehicles are consumers of the same weather workflow information for dynamic routing and improving time and battery usage estimates due to wind, but also have mission specific goals such as low latency video transfer, as well as complex digital twin models to anticipate and mitigate problems.  As with the CASA use cases, the diverse mission requirements lend themselves to hybrid computing.   Moreover, these aircraft fly in potentially constrained environments where network bandwidth may be lacking, and where compute infrastructure may be shared between vehicles.  With this in mind we have started testing KubeEdge, a Kubernetes cluster instantiation tailored to edge devices, using a Chameleon KVM controller node and CHI@Edge devices for processing.  We’re examining the ability of the edge devices to receive and process video streams, with the idea that drones may be able to offload recently collected images to shared local resources.  A drone could make a request for a video offload, triggering the Kubernetes controller to deploy a server process on the KubeEdge cluster, and then cue the drone to begin streaming when ready.  We’ve also utilized GPU enabled Jetson Nano devices available at CHI@Edge to run neural network based object identification algorithms.  We look forward to extending this work as new devices come online, and consider how best to allocate processing between the powerful servers at the core, and local computers at the edge.

*Dynamo– National Science Foundation Grant #OAC-1826997
**FlyNet– National Science Foundation Grant #OAC-2018074

Can you point us to artifacts connected with your experiment that would be of interest to our readership?

Key Publication:
Toward a Dynamic Network-Centric Distributed Cloud Platform for Scientific Workflows: A Case Study for Adaptive Weather Sensing

Mobius (for cloud provisioning on Chameleon, Mass Open Cloud, and Fabric) https://github.com/RENCI-NRIG/Mobius

Pegasus WMS (to plan and execute complex workflows in a multi node dynamic system)

CASA’s Hail Workflow (derive single and multi radar hail, create images and contours) https://github.com/pegasus-isi/casa-hail-workflow

FlyPaw (FlyNet’s AERPAW integration, utilizing CHI@KVM nodes for drone image processing)

Tell us a little bit about yourself

The Dynamo and FlyNet team consists of professors, post-graduates, and doctoral students from 4 major universities coming together toward a common goal, each bringing a unique skill set and research history to the collaboration: The Missouri engineering team with a strong background in video processing, neural networks, and the utilization of edge devices for processing, the UNC RENCI team as a true leader in academic compute clouds, networks, and enabling the research community, USC ISI bringing software expertise and ability to promote efficiency and scalability of user workflows with the Pegasus WMS, and UMass Amherst ECE, with their real-world use cases and observations from a multi-faceted operational system.

Most powerful piece of advice for students beginning research or finding a new research project?

Phrase, philosophy, feel free to be creative.
Build on your strengths, you know them by now. By building on your strengths, it is very likely that you will (mostly) enjoy what you are doing. This alone will serve you as an excellent motivator for your research. In addition, be ready to tackle new problems and acquire new skills that will help you to address and solve these new problems. Do as much as possible to work on research topics that really motivate you. Finally, find good mentors who will guide you in your research work (and sometimes advise you in more general matters, e.g., work life balance, career, etc.)

How do you stay motivated through a long research project?

Over the course of my career if have learned that progress in research does not constantly need new results. Sometimes, one works for weeks with no significant results, and then all of a sudden a new result is produced. The art of being a great researcher is to “sense” when putting in additional effort will be in vain (since one is heading down the wrong path) or if putting more effort in will pay off in the long term.
Tip: Read the biographies of great researchers and you will see that most of them struggle with this issue.

Are there any researchers you admire? Can you describe why?

There are too many to mention. Going back far in time, I’m certainly impressed by what cultures like the Ancient Greeks, the Egyptians, and the Maya had already discovered. If you think about their tools (certainly no electricity of computers), they were really great scientists.

Why did you choose this direction of research?

I am an engineer and it is important for me to perform research that leads to tangible outcomes. As a systems person in Computer Engineering, I can best perform this through experimental research. This includes the building of new, complex systems. I have done that multiple times in my career. While the path of building such systems is paved with many struggles, it is always a tremendous reward when they are up and running. Most satisfying is when I can use them for new experimental research!


Add a comment

No comments