Manual for hpc4you_toolkit.
It is the v2 + the Web Interfaces
. The Web Interface are for:
In a word, v3 features parallel computing cluster with web front-end.
Click for Demoπ.
The hpc4you_toolkit is a simple but robust toolkit written by a computational chemist to set up a parallel computing cluster for scientific research.
No computer skills or Linux knowledge are needed. Only copy and paste the cmd from the screen then press the Enter key.
If you have some knowledge of how parallel computing clusters work, and how to administrate Linux in the cmd line and configure Linux networking, then you can try out the OpenHPC solution.
Currently, the hpc4you_toolkit supports:
unzip -qo hpc4you*.zip; source code
.All subsequent 5 commands will be automatically displayed in green.
All you need to do is to copy and paste the green text according to the on-screen instructions.
Choose one machine as the management/master/login node, and run the following code,
curl https://raw.githubusercontent.com/hpc4you/hpc/main/getInfo.sh | bash
Follow the on-screen prompts carefully.
Remeber to copy and paste the blue lines into the body of your Email, please also attach the XXX.dat file with the Email.
On the login node, edit file /etc/hosts.
If you have configured the hostname for master/login node to master, and configured the hostname of computing/slave nodes to nodeXX, the content of the file /etc/hosts should be similar to the following.
### the IP and the corresponding real hostname
192.168.1.100 master
192.168.1.101 node1
192.168.1.102 node2
192.168.1.112 node12
If the hostname of the login/master node is not master, and/or the hostname of any computing/slave node is not nodeXX, the content of the file /etc/hosts should be similar to the following.
# the IP and real hostname
192.168.1.100 server0
192.168.1.101 server1
192.168.1.102 server2
192.168.1.112 server12
### the IP and the corresponding alias hostname
192.168.1.100 master
192.168.1.101 node1
192.168.1.102 node2
192.168.1.112 node12
In this example,
hostname
on the login node.hostname
on the computing node which can be accessed via IP 192.168.1.112.nmtui
to set hostname and configure the IP address.In v3, the default hostname still follows the master/nodeXX
pattern.
Youβd better follow this nomenclature.
It is possible to customize the machine name using the methods described above. After customizing the hostname, you may need to refer to the Open OnDemand manual (https://osc.github.io/ood-documentation/release-3.0/index.html) to make the necessary configuration changes.
Upload the package hpc4you_toolkit-XXX.zip to the login node.
SSh into the login node, run:
unzip -qo hpc4you*.zip; source code
Follow the on-screen prompts carefully.
You will be asked to copy and paste ./step1.sh
into the current terminal. Please just do it.
Please wait a while (The waiting time depends on the network bandwidth), you would read on the screen,
Default root password for all slave nodes
Please type the root password, then press the Enter key.
Nothing to do but wait β¦
Copy and paste the green line, then press the Enter key, and wait β¦
All servers will automatically reboot at least twice, and then the slurm scheduling cluster is ready for you.
In particular, the restart operation of all compute nodes will be 1 minute lag behind the master node.
Nothing else to do, just wait.
By default, the /opt and /home from the master/login node are shared among all slave nodes. It is possible to add new share path by,
On login node, run
useradd_hpc tom
this will add user tom to the default group users.
useradd_hpc chem tom
this will add user tom to the group chem.
Caution:
sacctmgr add user tom Account=hpc4you
in which, you will give user tom the default account hpc4you. Refer slurm manual for more details.
On the login/master node, run,
userdel_hpc tom
in this case, user tom is to be deleted.
Power on the master node and all switches first, and then power on all computing nodes.
On the login/master node, run, poweroff_hpc
.
On the login/master node, run, reboot_hpc
.
There are two approaches available.
nmtui
.setup_hpc ββsync_file /etc/hosts
.slurmd βC | head -n 1
, please copy the output.setup_hpc ββsync_do 'systemctl restart slurmd'; systemctl restart slurmctld
.addNewComputeNode.sh
.passwd
to change the root password, then run setup_hpc --sync_user
.enhance_security.sh
to apply awesome security hardening configurations automatically. (v2 only)On the login/master node, run
setup_hpc --sync_file /full/path/to/file
For example, setup_hpc --sync_file /etc/hosts
will sync the hosts file on master node to all slave nodes.
On the login/master node, run
setup_hpc --sync_do cmd
For example,
setup_hpc --sync_do uptime
will print the uptime info for all slave nodes.
setup_hpc --sync_do 'systemctl restart slurmd; utpdate -u 3.cn.pool.ntp.org'
will restart slurm client and sync time on all slave nodes.
hpc4you_toolkit solo, is also available, which can deploy slurm to workstations by only run source code
.
FEATURES | Basic | Adv | Pro |
---|---|---|---|
CPU Scheduling | β | β | β |
GPU Scheduling1 | β | β | β |
Job Log | β | β | β |
User Control2 | β | β | β |
Monitoring Historical | β | β | β |
Monitoring Realtime | β | β | β |
Security & OS Optimize3 | β | β | β |
Pricing | 399 USD | Email ask@hpc4you.top | Email ask@hpc4you.top |
FEATURES | Basic | Adv |
---|---|---|
CPU Scheduling | β | β |
GPU Scheduling | β | β |
Job Log | β | β |
Pricing | 99 USD | 149 USD |
Real-time demo on how to set up HPC for scientific research with hpc4you_toolkit. Only copy and paste. No computer skills and Linux knowledge are required.
To those who have no patience to watch real-time videos,
source code
, demonstration of hpc4you_toolkit solo πBV1Gg411R7ttIn the field of parallel computing clusters, we refer to a server as a node.
Usually, in small-scale clusters, the login server, storage node, and the contollor server can be merged into a single server, which is called as the login/master node. Accordingly, we refer to all compute nodes, as the slave nodes.
SLURM natively supports GPU scheduling. However, the auto-detection mode often fails and needs to be configured manually once. If you canβt do it, I can be a helping hand, but please pay the fee.Β ↩
To prevent users from sshing into nodes that they do not have a running job on, and to track the ssh connection and any other spawned processes for accounting and to ensure complete job cleanup when the job is completed. More infoΒ ↩
Reference, https://github.com/decalage2/awesome-security-hardeningΒ ↩