Table of Contents
Aeolus is managed following an enhanced community condominium model. Investors (individual researchers or groups of affiliated researchers) purchase compute nodes (or storage) from a catalogue of nodes (and storage options). The Aeolus Investment Catalogue offers compute resources that have been verified for compatibility. The Catalog will be updated as vendor prices change, at least annually. Once purchased, such nodes are integrated into Aeolus, and administered by the Aeolus systems administrators. The VCEA Aeolus enhanced community condominium model for HPC allows individual investors and groups to access more computing resources and storage than would be possible by individually purchasing and managing a comparable standalone system, for a given level of investment of funds and faculty and staff time.
In an effort to support the unique computing needs of research computing, VCEA has committed to supporting the creation of both an annual budget and service center to help supplement infrastructure costs.
Computer components have a finite useful life. For purposes of Aeolus management, compute nodes and disk drives are assumed to have useful lifetimes of 5 years. Some individual components will fail sooner and some will last longer. The fact that components have limited lifetimes has implications in terms of Aeolus management, investment timing, and user data management. After five years compute nodes will be deprecated, perhaps be used for parts, low priority computing, and eventually retired. The investor is not responsible for node disposal after the useful life of that node has expired; Aeolus management assumes responsibility for such e-waste disposal. The fact that components have predictably limited useful lifetimes implies that long-term projects must plan in their proposal budgets for intermittent re-investment.
Prior Investment Policy
In an effort to recognize and honor investments into Aeolus before July 01, 2016, the following describes the policy for grandfathering equipment and access to the cluster. New investment practices allow for an expected useful lifetime of 5 years for compute node and storage.
- Compute nodes purchased before July 1, 2014 will be subject to the lifetime of their warranty when purchased.
- Compute nodes purchased on or after July 1, 2014 will fall under the new Aeolus Investment Policies.
- All prior investments in storage will be subject to the lifetime of the warranty when purchased. In most cases, this is 3 years.
- Additional storage can be purchased in increments of 100GB. There will be a grace period of 6 months for migration of user data to a resource provided by the user. After that, data will be irrevocably purged.
- Other investments, not falling under the category of Compute Nodes or Data Storage are subject to the warranty of the equipment.
Compute Queue Management
Currently, Aeolus employs a round-robin scheduler via Torque and Maui.
With the implementation of the new virtual/redundant login and management servers, Aeolus will be migrating scheduling to the simple Linux user resource manager (SLURM). This will be a major step forward and may enable sharing compute resources with other peer clusters, leading to potential grid computing.
Compute Nodes Funding
The following defines the policy for investing into Aeolus by purchasing compute nodes.
Standard compute nodes can be purchased from our Aeolus Investment Catalogue. Non-standard compute nodes can be purchased for special computing needs, based on the Aeolus Catalogue, in coordination with and after approval by the Aeolus systems administrators.
Investment in Aeolus through the purchase of a compute node guarantees access to Aeolus HPC for the duration of the 5-year hardware lifetime. Additional investments over time can renew the investor’s continuous access to the cluster.
With an investment in a compute node, 10 GB of storage per core will be made available to that project, user, or group for project storage, during the duration of the 5-year hardware lifetime (see Storage below).
The storage system design for Aeolus envisions multiple tiers; the following defines the tiers and corresponding rules of storage on the cluster.
Tier 0, Fast Scratch – high speed, volatile (short lived) storage
- The lifetime of data on /fastscratch is ~2 weeks.
- This fast scratch tier will be implemented with solid state ‘drives’, to provide improved performance of data reads and writes.
Tier 1, Compute Storage – storage designed for large projects and compute input/ output.
- Beyond a 6 month period, data that is not actively being used or accessed will be moved to tier 3 Archival Storage.
- Compute storage will have a warm backup/mirror for recovery and restoration of data and to further provide compliance with NSF grants.
Tier 2, General Storage – user home directories, modules, and logs
- The lifetime of data on general storage is based upon the access time from investments of compute node(s) or the purchase of storage directly.
- This is not intended for job input or output.
Tier 3, Archival Storage – long term slow archival storage
- The lifetime of archival storage is based upon the access time from investments of compute node(s) or the purchase of storage directly
- This is not intended for computing against and will be available as read only.
Tier 4, Backup Storage – offsite, tape, and disk backups
- Home directories are backed up to tape.
- On a case-by-case basis, data can be backed up and picked up to be delivered offsite for data that no longer needs to be stored in Aeolus HPC environment.
All users are provided with a 100 GB home directory (see Tier 2) thanks to the generous contributions of the Voiland College of Engineering and Architecture as well as other initial investors.
All users may take advantage of high performance fast scratch file space to temporarily store data, for up to two weeks (see Tier 0).
Additional storage can be purchased in increments of 100GB. Beyond the 5-year lifetime for such storage, there will be a grace period of 6 months for migration of user data to a resource provided by the user. After that, data will be irrevocably purged.
By using the resources associated with the Aeolus Cluster, a user acknowledges and agrees to comply with the user policies stated in the Aeolus User Policy.
We distinguish between three types of Aeolus users: Investor, User, and Trial User. The responsibilities of each user type with respect to Aeolus use are outlined below..
This section defines the responsibilities of all users on Aeolus.
All three types of Aeolus users shall:
- Review the User Policy Document
- Review the Aeolus documentation
- Sign up for the Aeolus email list
- Review the presentations on submitting jobs for batch queue execution, as this is vital information for acceptable use of Aeolus.
Investor: An Investor may be either an active user, making actual direct interactive use of Aeolus to run jobs, or a passive user, investing in Aeolus in order to provide Aeolus access for their laboratory’s students and staff.
User: With the approval of an Investor, an individual can be provided an account for use of Aeolus. With investment or sponsorship by an Investor, a trial account will be switched to a regular user account, for the lifetime of the enabling investment.
Trial User: Upon approval by the HPC Committee, a new user may be provided an account for a 6-month trial period, to “test out” using the Aeolus HPC in their work and/or to generate data for a proposal.
Policy Review and Revisions
The VCEA HPC committee will review and revise these policies as necessary, and at least annually. This is the July 2016 version.