Next-generation scientific applications typically generate colossal amounts of simulation or experimental data, on the order of terabytes at present and exabytes in the near future, which must be stored, managed, and transferred to different geographical locations for distributed data processing and analysis. Several storage resource management and network resource provisioning projects are under rapid development to facilitate such large-scale data distributions over wide-area networks. However, these existing tools, systems or services have a very limited user scope mainly because their deployment requires a certain level of network/host reconfigurations and most science users are even not aware of their existence in the target network environments. We design and develop a Network-Aware Data Movement Advisor (NADMA), which provides users a set of feasible route options for data transfer in distributed network environments with automatic storage and network resource discovery and performance estimation.
This project will design and develop a generic Scientific Workflow Automation and Management Platform (SWAMP), which contains a set of easy-to-use computing and networking toolkits for application scientists to conveniently assemble, execute, monitor, and control complex computing workflows in heterogeneous high-performance network environments.
We propose a UDP-based transport method, Performance-Adaptive Peak Link Utilization Transport (PAPLUT) that automates the rate stabilization for peak link utilization using stochastic approximation methods, as opposed to the manual parameter tuning in the Hurricane transport. Another salient feature of PAPLUT is that it incorporates a performance-adaptive flow control mechanism to regulate the activities of both the sender and receiver in response to system dynamics to achieve high user-level goodput in high-performance dedicated networks. We utilize the existing SA analysis methods to show the asymptotic stability and convergence of this method in maximizing the link utilization over dedicated connections under fairly mild conditions on the error process and underlying throughput, loss and retransmission profiles. We provide a mathematical analysis to investigate the impact of system factors on the performance of transport protocols and estimate the receiving bottleneck rate based on rigorous event modeling and prediction. We implement and benchmark our method against other commonly used transport protocols.
We conduct extensive and in-depth study of data routing, sensor deployment, and information fusion in distributed wireless sensor networks.
In high-performance networks, each dedicated channel typically consists of one or more physical links that are shared by multiple applications in both time and bandwidth through in-advance reservation. We design both instant and periodical bandwidth scheduling algorithms to maximize the utilization of dedicated network resources and meet diverse end-to-end transport performance requirements. The instant scheduling algorithm is executed immediately when a new data transfer request arrives and then the bandwidths are reserved in advance on relevant links, while a periodical scheduling algorithm is launched in a certain interval to schedule a number of data transfer requests accumulated in one interval.
The advent of large-scale collaborative scientific workflow applications has demonstrated the potential for broad scientific communities to pool globally distributed resources to produce unprecedented data collections, simulations, visualizations, and analysis. System resources including supercomputers, data repositories, computing facilities, network infrastructures, storage systems, and display devices have been increasingly deployed around the globe. These resources are typically shared by large communities of users over Internet or dedicated networks and hence exhibit an inherent dynamic nature in their availability, accessibility, capacity, and stability. The success of these large-scale distributed workflow applications requires a highly adaptive and massively scalable distributed computing platform that provides optimized computing and networking services. We study the computational complexity of and design algorithms for mapping DAG-like workflows to overlay computer networks to achieve optimized end-to-end performance in terms of delay, throughput, and reliability.