Facilities

High-Performance Computing Laboratory



Overview

The High Performance Computing Laboratory at Yang's group consists of 4 AMD-based Linux clusters. The parallel computing system presently includes 3398 cores using AMD processors. The computing nodes in three clusters are connected through a 1 Gbps Nortel/SMC switch, while in the fourth cluster, they are connected through two Infiniband fabric switches. The entire system can sustain about 15 Teraflops total peak performance, providing substantial number-crunching capabilities for large-scale computations.

Parallel operation is accomplished by implementing the MPI (Message Passing Interface) library. In addition, several advanced workstations with high-speed graphics cards are used to provide 3D visualization capabilities. This allows researchers to explore 3D flow structures obtained from numerical calculations in real time.

back to top


History

October 1998
  • The first generation system, a clone of COCOA, consisted of 20 Dell PW410 (Dual-PII400);
  • A 24-ports Fast Ethernet Fore Switch was used for connection;
  • DQS queue system was used;
  • Static IP address assigned to all nodes.
May 1999
  • The system was expended to 46 nodes, connected with two Fore switches.
  • Dynamic IP was used.
October 1999
  • 63 Dell PW 420 workstations (Dual-PIII450) were added into the system;
  • A 3COM 9300 (GigaBit Switch) was used to connect two 3COM 3900 switches;
  • Cooling became a big problem for the system. Several fans were used for convective cooling;
  • The PBS queue was implemented to replace the DQS queue.
December 2000
  • 70 Dell PW 220 workstations (Dual-PIII 733) were added to the system;
  • The system was redesigned, including shelves, cables, and servers;
  • Two heavy-duty air conditioners were installed;
  • Electric panel was upgraded.
January 2003
  • 170 Dell OptixPlex (P4 2.4) were added to the system;
  • The Intel Fortran Compiler was installed.  It was FREE and better;
  • Electric power supply became the biggest problem for the system;
Summer 2003
  • Another heavy-duty air conditioner and several heavy-duty industrial fans were installed for cooling.
  • A Dell 5012 PowerConnect (GigaBit Switch) was installed for server connection.
February 2004
  • Household fans from the Wal-Mart store were used to cool the compact Dell GX260s individually.
  • Failed hard drives were removed from the GX260s. The system has been switched to diskless cluster.
January 2006
  • A 80-node (160-CPU) KiloCluster was installed.
December 2007
  • A 42-node (84-CPU) dual-core KiloCluster was installed.
September 2008
  • A 31-node (62-CPU) dual-core disk KiloCluster was installed.
  • The 42-node (84-CPU) dual-core KiloCluster was expanded to a 57-node (114-CPU) dual-core KiloCluster.
  • At this point, the laborato ry consists of 6 racks of computing nodes.
December 2009 
  • A 39 node (936 cores)  diskless cluster was installed.
     
June 2010  
  • A 20 node (480 cores)  diskless cluster was installed.
     
June 2012  
  • A 25 node (1600 cores) cluster was installed.

October 2014 
  • A 22 node (1408 cores)  cluster was installed.
     
August 2016 
  • A 25 node (1600 cores)  cluster was installed.
     
January 2018 
  • A 58 node AMD Opteron  cluster was replaced by an 18 node Intel Xeon 5 cluster.
     
October 2018 
  • The 58 node AMD Opteron cluster was retired.
     

back to top


Software

Operation System Fedora, developed by Fedora Project and sponsored by Red Hat, Inc. http://fedoraproject.org
Centos 5.0 http://www.centos.org/
Parallel Environment Message Passing Interface (MPICH) http://wwwunix.mcs.anl.gov/mpi/mpich/
Compiler Fortran 10.0 (Intel), C++ (GCC) http://www.intel.com
Queue System Public Batch System (PBS)-Torque 2.3.0 and Maui 3.2.6 http://www.openpbs.org
http://www.clusterresources.com/pages/products/maui-cluster-scheduler.php
Node Management Warewulf and Perceus http://www.perceus.org/portal/
File System Network File System (NFS) http://nfs.sourceforge.net/
User Account Manager Network Information System (NIS) http://www.linux-nis.org/
Remote Access Security Shell (SSH, Internet) http://www.openssh.org
Remote Shell (RSH, Intranet)
Work with MS Windows SAMBA http://www.samba.org
Firewall IP Firewalling Chains (ipchains) http://netfilter.samba.org/ipchains/
IP Allocation Dynamic Host Configuration Protocol (DHCP) http://www.dhcp.org
Time Synchronization Network Time Protocol (NTP) http://www.ntp.org
Cluster management suite      Scyld ClusterWare  HPC 5.0 http://www.penguincomputing.com/

                      

back to top


Hardware

back to top


System Topology  

systop

 

back to top


Photos

       

        Dual-PIII computers, in December 2000

       

        170 Pentium IV computers, in January 2003

         

        510 Pentium Processors

       

        AMD KiloCluster with 160 CPUs, in January 2006

       

        Rack-based clusters, in September 2008 (From the left: 57-node diskless CPU cluster, 31-node disk CPU cluster, 80-node CPU cluster)

       

        December 2009 (39-node diskless clusters based on new Istanbul processors)

       AMD Cluster with 3000 CPUs, in April 2013


back to top

Last Updated 25th November, 2015.