Two Macs forming a tiny HPC cluster — Our two-node ‘mini-cluster’: controller + compute.

A Tutorial for Learning HPC Using 2 Macs for an LLM/NLP application

Published on October 22, 2025

In this tutorial I'll teach you how to create yourself a High-Performance Computing (HPC) cluster—without access to a data center with thousands of GPUs—to train a small LLM for an applied NLP task, and using only two Macs on your desk.

But wait, this isn’t a toy demo, we’ll create a real, working mini-cluster that mimics what you’d find in AI or scientific computing environments in research centers or universities, containing: (1) a controller node, (2) a compute node, and (3) a job scheduler that distributes tasks among computing resources automatically—and each command, config, and code snippet in this tutorial is as used in production.

By the end, you’ll:

Learn how Slurm schedules and monitors jobs across multiple nodes.
Use MPI to coordinate parallel workloads.
Fine-tune a distilled GPT-2 model (LoRA adaptation) for a lightweight NLP task.
Run distributed inference that leverages both Macs simultaneously.

But first, let's understand why HPC matters.

Many jobs as AI Engineer are demanding some level of knowledge in HPC. The reason is that modern AI engineering may involve computationally intensive training, where a correct orchestration of thousands of GPUs (and other computational resources) during training is required.

HPC is used to make complex calculations and data processing much faster than any single computer. Therefore, HPC is used to solve large-scale problems in science, engineering, and off course, AI. For instance, training large language models (LLM) requires that large neural networks are split across multiple machines and HPC is the go-to computer architecture.

In the First part, we are going to set up 2 Macs for our HPC cluster, we need to install and setup the software for making work as Linux VMs on top of them. In the second part we will setup the HPC cluster with 2 Macs and 1 Ubuntu Server for a simpe NLP/LLM task.

PART 1—PREPARE MACS

Step 0: Enable SSH Between Your Two Macs

Decide Mac A is the controller and Mac B is the compute node because the controller is the busiest node

First give each Mac a clear name:

Navigate the System Settings menu to the Sharing options:
- for macOS 12 Monterrey or before: System Settings → Sharing
- for macOS 13 Ventura or after: System Settings → General → Sharing
Set Computer Name (e.g., MacA and MacB).
Restrat your Mac.
Confirm your hostname name by running hostname on terminal:

# From MacA or MacB
% hostname

MacB.local

Notice that your local network (a router, more likely) may rename your hostname dynamically before registering it in its DHCP table. Therefore, your actual hostname may be different than MacB.local.

Turn on Remote Login (SSH) on both Macs:

Navigate the System Settings menu to the Sharing options:
- for macOS 12 Monterrey or before: System Settings → Sharing → Remote Login → On
- for macOS 13 Ventura or after: System Settings → General → Sharing → Remote Login → On
Under Allow access for, pick your user (or All users if you prefer). For macOS 13 Ventura or higher, click on "ℹ" to find those options.

Notice that this also opens the firewall for SSH (port 22). If you use a third-party firewall, allow inbound TCP 22.

Set up key-based auth On Mac A:

Replace youruserb with Mac B’s short username and yourusera with Mac A's short name.

# On Mac A: generate a modern key if you don't have one. In this case we will use the name "id_ed25519", which generates two files:
# ~/.ssh/id_ed25519 (private key — never share)
# ~/.ssh/id_ed25519.pub (public key — this is what you copy)

# On Mac A: generate a new key pair
% ssh-keygen -t ed25519 -C "macA→macB"

Generating public/private ed25519 key pair.
Enter file in which to save the key (/Users/yourusera/.ssh/id_ed25519): 
/Users/yourusera/.ssh/id_ed25519 already exists.
Overwrite (y/n)? yes
Enter passphrase for "/Users/yourusera/.ssh/id_ed25519" (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /Users/yourusera/.ssh/id_ed25519
Your public key has been saved in /Users/yourusera/.ssh/id_ed25519.pub
The key fingerprint is:
SHA256:############################################ macA→macB
The randomart image of the key is:
+--[ED25519 256]--+
|                 |
|     . .         |
|      o .o =     |
|   . o ..0O +    |
|  . o ==S+ X     |
|   . ++OE+E .    |
|    . o+oO.=     |
|        o.. .    |
|       o&=..     |
+----[SHA256]-----+

Now we copy the public key from MacA to MacB:

# Copy your public key to Mac B
% ssh-copy-id -i ~/.ssh/id_ed25519.pub youruserb@MacB

/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/Users/yourusera/.ssh/id_ed25519.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
(youruserb@macb) Password:

Number of key(s) added:        1

Now try logging into the machine, with: "ssh -i /Users/yourusera/.ssh/id_ed25519 'youruserb@MacB'"
and check to make sure that only the key(s) you wanted were added.

We remove old stored fingerprints from known_hosts from Mac B:

% ssh-keygen -R MacB.local

Finally, test the connection from MacA to MacB, which should succeed without asking for a password:

# Connect to Mac B
% ssh youruser@MacB.local
# Host MacB.local found: line 2
/Users/username/.ssh/known_hosts updated.
Original contents retained as /Users/username/.ssh/known_hosts.old

Now we are sure that we can connect to Mac B from Mac A.

Step 1: Install the tooling (on both Macs)

If you don't have brew installed, this is the moment to install it. Notice that Homebrew requires Xcode Command Line Tools (CLT) to function properly on macOS, so if you don't have it installed, the required Xcode dependencies will be installed automatically.

XXXXXXXXXXXXXXXXXXXXX# in Mac A and Mac B, install brew if needed:
% /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Since Slurm and MPI are native to Linux, we need to create isolated Ubuntu instances on our Macs that act just like Linux nodes. Each VM runs a real Ubuntu 24.04 environment with its own IP and SSH server. We achieve this by using Multipass, a command-line tool to manage VMs.

For macOS 13 Ventura or after (Mac A), you need to install the multipass library using this straightforward way:

# In Mac A install Multipass
% brew install --cask multipass

(...)
==> Running installer for multipass with `sudo` (which may request your password)...
Password:
installer: Package name is multipass
installer: Installing at base path /
installer: The install was successful.
🍺 multipass was successfully installed!

At the moment of writing, the latest version of Multipass is 1.16.1. For macOS 12 Monterrey or before (Mac B), we need to install an older version of the Multipass library:

# In Mac B install Multipass
% curl -L -o ~/Downloads/multipass-1.14.1+mac-Darwin.pkg \
     "https://sourceforge.net/projects/multipass.mirror/files/v1.14.1/multipass-1.14.1%2Bmac-Darwin.pkg/download"
% sudo installer -pkg ~/Downloads/multipass-1.14.1+mac-Darwin.pkg -target /
---

<!-- $ sudo installer -pkg ~/Downloads/multipass-1.14.1+mac-Darwin.pkg -target /
% curl -L -o ~/Downloads/multipass-1.13.1+mac-Darwin.pkg \
     https://github.com/canonical/multipass/releases/download/v1.13.1/multipass-1.13.1+mac-Darwin.pkg
sudo installer -pkg ~/Downloads/multipass-1.13.1+mac-Darwin.pkg -target /-->

On Intel Macs running Monterey, every recent Multipass (≥ 1.15) build fails because the installer’s post-install script now assumes Apple’s Virtualization Framework—which Monterey on Intel doesn’t provide. Therefore, we ensure that the installer skips the post-install script using the version 1.14.1.

Now either in Mac A or Mac B, we can open now the VM manager to test the installation:

# Open the VM manager
% open -a "Multipass"

The user interface of Multipass (≥ 1.14) running on macOS Tahoe 16.0.1 looks like this:

Multipass UI

Now we can install the other required tools on both Mac A:

# In Mac A install VM runtime and utilities
% brew install git python@3.13 open-mpi cmake pkg-config

On Mac B, macOS 12 we need to handle open-mpi differently, so we start with:

# In Mac B install VM runtime and utilities
% brew install git python@3.13 cmake pkg-config

And we install open-mpi from source. in macOS Monterrey OpenMPI itself is impossible to install using brew. So we will skip Homebrew for OpenMPI and build it ourself with Apple’s Clang, without Fortran. That gives us a pure, self-contained MPI stack in our home directory.

Make sure you have Apple’s Command Line Tools

# In Mac B install Xcode Command Line Tools
$ xcode-select --install

Download a stable OpenMPI release, for instance 4.1.x, which is widely used on clusters:

#In Mac B install OpenMPI from source (with no Homebrew)
$ cd ~/Downloads

$ curl -LO https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.7.tar.gz
$ tar xzf openmpi-4.1.7.tar.gz
$ cd openmpi-4.1.7

Configure OpenMPI to install into your home dir, using Clang, with Fortran disabled.

# In Mac B compile OpenMPI from source
$ ./configure \
    --prefix=$HOME/opt/openmpi \
    --disable-mpi-fortran \
    CC=clang \
    CXX=clang++

Now build and install OpenMPI:

# In Mac B build and install OpenMPI using hardware threads count for a faster built
$ make -j"$(sysctl -n hw.logicalcpu)"
$ make install

Add this OpenMPI to the Mac B's PATH edditing ~/.zshrc:

export PATH="$HOME/opt/openmpi/bin:$PATH"
export LD_LIBRARY_PATH="$HOME/opt/openmpi/lib:$LD_LIBRARY_PATH"
export DYLD_LIBRARY_PATH="$HOME/opt/openmpi/lib:$DYLD_LIBRARY_PATH"

We reload your shell to apply the changes:

$ source ~/.zshrc

Now we verify how OpenMPI works in Mac B:

% which mpicc
/Users/reyesgarcia/opt/openmpi/bin/mpicc

% mpicc --version
Apple clang version 14.0.0 (clang-1400.0.29.202)
Target: x86_64-apple-darwin21.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

% which mpirun   
/Users/usernameb/opt/openmpi/bin/mpirun

% mpirun --version
mpirun (Open MPI) 4.1.6

Report bugs to http://www.open-mpi.org/community/help/

Finally, a quick tet of MPI in Mac B:

cat > hello_mpi.c << 'EOF'
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
    MPI_Init(&argc, &argv);
    int rank, size;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    printf("Hello from rank %d of %d\n", rank, size);
    MPI_Finalize();
    return 0;
}
EOF

mpicc hello_mpi.c -o hello_mpi
mpirun -np 2 ./hello_mpi

The output must be:

Hello from rank 0 of 2
Hello from rank 1 of 2

In Mac B, we also install JSON Query, a tool for formatting JSON data:

# In Mab Mac B install jq
brew install jq

We have just installed the following tools:

git: Git, the version control system.
python: the version 3.13 of Python.
open-mpi: Open MPI, for making parallel, distributed computing apps.
cmake: to build, test, and package software.
pkg-config: a utility for finding installed libraries.

Step 2: Create the “cluster” VMs

We’ll make one VM per Mac. Name them ctrl (on Mac A) and node1 (on Mac B). Give each 4 vCPU / 6 GB RAM to start.

Let's find out the available networks:

multipass networks

Name   Type       Description
en0    wifi       Wi-Fi
en4    ethernet   Ethernet Adapter (en4)
en5    ethernet   Ethernet Adapter (en5)
en6    ethernet   Ethernet Adapter (en6)

On Mac A, the controller:

% multipass launch 24.04 --name ctrl --cpus 4 --memory 6G --disk 20G --network en0
% multipass info ctrl --format json | jq -r '.info.ctrl.ipv4[0]'
# note this IP as CTRL_IP

On Mac B, the node 1:

% multipass launch 24.04 --name node1 --cpus 4 --memory 6G --disk 20G --network en0 
% multipass info node1 --format json | jq -r '.info.node1.ipv4[0]'
# note this IP as NODE1_IP

Give each VM a friendly hostname:

# In Mac A: set the hostname of the controller VM
$ multipass exec ctrl  -- sudo hostnamectl hostname ctrl

# In Mac B: set the hostname of the compute VM
$ multipass exec node1 -- sudo hostnamectl hostname node1

Now we allow the nodes to see each other: on Mac A, add an entry for node1’s VM IP; on Mac B, add an entry for ctrl’s VM IP. (Replace NODE1_IP / CTRL_IP with the addresses you noted.)

# On Mac A
multipass exec ctrl -- bash -lc 'echo "CTRL_IP ctrl" | sudo tee -a /etc/hosts'

# On Mac B
multipass exec node1 -- bash -lc 'echo "NODE1_IP node1"   | sudo tee -a /etc/hosts'

Run if the router has changed the UPs.

Check the hosts IPs

multipass list

Name       State   IPv4             Image
ctrl       Running 192.168.64.3     Ubuntu 24.04 LTS
node1      Running 192.168.64.5     Ubuntu 24.04 LTS

Step 3 — Install Slurm + MPI + Munge inside the VMs

multipass exec ctrl -- bash -c "sudo apt update &&
sudo apt install -y build-essential curl git
openmpi-bin libopenmpi-dev
munge libmunge2 libmunge-dev slurm-wlm"

/////

multipass exec ctrl -- bash -c "sudo dd if=/dev/urandom bs=1 count=1024 of=/etc/munge/munge.key" multipass exec ctrl -- sudo chown munge:munge /etc/munge/munge.key multipass exec ctrl -- sudo chmod 400 /etc/munge/munge.key

1024+0 records in 1024+0 records out 1024 bytes (1.0 kB, 1.0 KiB) copied, 0.000748547 s, 1.4 MB/s

multipass exec ctrl -- ls -l /etc/munge/munge.key

ls: cannot access '/etc/munge/munge.key': Permission denied

(Because this is a security requirement: 👉 The munge key must only be readable by the munge service account.

So the system is exactly behaving as it should 🔐)

multipass exec ctrl -- sudo ls -l /etc/munge/munge.key

-r-------- 1 munge munge 1024 Nov 16 23:32 /etc/munge/munge.key

////

multipass exec ctrl -- hostname multipass exec ctrl -- hostname -I multipass exec ctrl -- slurmd -C

ctrl 192.168.2.10 fd40:deda:e230:183d:5054:ff:fe46:d3c2 NodeName=ctrl CPUs=4 Boards=1 SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=1 RealMemory=5907 UpTime=0-00:36:18

///

multipass exec node1 -- hostname multipass exec node1 -- hostname -I multipass exec node1 -- slurmd -C

node1 192.168.64.8 fd0f:7999:d59f:4b71:5054:ff:fefc:ec0c NodeName=node1 CPUs=4 Boards=1 SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=1 RealMemory=5919 UpTime=0-00:39:27

Run inside each VM og Mac A:

# Example for ctrl VM
$ multipass exec <b>ctrl</b> -- bash -lc '
    sudo apt-get update &&
    sudo apt-get install -y slurm-wlm munge libmunge2 libmunge-dev \
                            openmpi-bin libopenmpi-dev python3-pip
    sudo systemctl stop slurmctld slurmd munge || true
    sudo mkdir -p /etc/slurm /var/spool/slurm /var/log/slurm
    sudo chown -R root:root /etc/slurm
    sudo apt-get install -y build-essential
    sudo pip3 install --upgrade pip
    sudo pip3 install mpi4py torch==2.3.1 transformers==4.44.2 datasets==2.20.0 peft==0.11.1 accelerate==0.34.2 '

Run inside each VM og Mac B:

# Example for ctrl VM
$ multipass exec <b>node1</b> -- bash -lc '
    sudo apt-get update &&
    sudo apt-get install -y slurm-wlm munge libmunge2 libmunge-dev \
                            openmpi-bin libopenmpi-dev python3-pip
    sudo systemctl stop slurmctld slurmd munge || true
    sudo mkdir -p /etc/slurm /var/spool/slurm /var/log/slurm
    sudo chown -R root:root /etc/slurm
    sudo apt-get install -y build-essential
    sudo pip3 install --upgrade pip
    sudo pip3 install mpi4py torch==2.3.1 transformers==4.44.2 datasets==2.20.0 peft==0.11.1 accelerate==0.34.2 '

Configure MUNGE (auth) and Slurm, Generate one MUNGE key on ctrl, copy to node1:

Create the MUNGE key on the controller (Mac A → ctrl VM)

$ multipass exec ctrl -- sudo mungekey --create
$ multipass exec ctrl -- sudo base64 /etc/munge/munge.key > munge.key.b64

We use SSH to copy the Munge key to node1, we SHH first from Mac A to Mac B (root):

# Transfer key to node1 within Mac B
$ sudo scp ~/munge.key.b64 reyesgarcia@MacB.local:~/munge.key.b64
# Transfer Munge key to correct directory Mac B:
$ multipass transfer ~/munge.key.b64 node1:/home/ubuntu/munge.key.b64
# Transfer Munge key to correct directory Mac A:
$ multipass transfer ~/munge.key.b64 ctrl:/home/ubuntu/munge.key.b64

Configure MUNGE on Both VMs

# Mac A
multipass exec ctrl -- bash -lc '
    sudo base64 -d /home/ubuntu/munge.key.b64 | sudo tee /etc/munge/munge.key >/dev/null
    sudo chown munge:munge /etc/munge/munge.key
    sudo chmod 400 /etc/munge/munge.key
    sudo systemctl enable munge
    sudo systemctl start munge
'

Do the same for node1:

multipass exec node1 -- bash -lc '
    sudo base64 -d /home/ubuntu/munge.key.b64 | sudo tee /etc/munge/munge.key >/dev/null
    sudo chown munge:munge /etc/munge/munge.key
    sudo chmod 400 /etc/munge/munge.key
    sudo systemctl enable munge
    sudo systemctl start munge
'

Verify MUNGE is Working:

#Mac A
multipass exec ctrl -- systemctl status munge
# Mac B
multipass exec node1 -- systemctl status munge

Test that they can trust each other

# From Mac A, test ctrl:
$ multipass exec ctrl -- munge -n | multipass exec ctrl -- unmunge

STATUS:          Success (0)
ENCODE_HOST:     ctrl (127.0.1.1)
ENCODE_TIME:     2025-11-16 15:44:29 +0100 (1763304269)
DECODE_TIME:     2025-11-16 15:44:29 +0100 (1763304269)
TTL:             300
CIPHER:          aes128 (4)
MAC:             sha256 (5)
ZIP:             none (0)
UID:             ubuntu (1000)
GID:             ubuntu (1000)
LENGTH:          0

From Mac B, test node1:

$ multipass exec node1 -- munge -n | multipass exec node1 -- unmunge

STATUS:          Success (0)
ENCODE_HOST:     node1 (127.0.1.1)
ENCODE_TIME:     2025-11-16 15:48:19 +0100 (1763304499)
DECODE_TIME:     2025-11-16 15:48:19 +0100 (1763304499)
TTL:             300
CIPHER:          aes128 (4)
MAC:             sha256 (5)
ZIP:             none (0)
UID:             ubuntu (1000)
GID:             ubuntu (1000)
LENGTH:          0

Then test MUNGE authentication:

multipass exec ctrl -- bash -lc 'munge -n | unmunge'
multipass exec node1 -- bash -lc 'munge -n | unmunge'

Step 4 — Start Slurm and run a first distributed job

Sync and deploy the Slurm configuration: /etc/slurm/slurm.conf. This file must be identical on both VMs and must have correct entries like:

At this point: Both VMs (ctrl, node1) exist. Slurm + Munge + OpenMPI + Python are installed. MUNGE keys are shared and munge -n | unmunge works on each node. You’re about to sync /etc/slurm/slurm.conf between them.

We sync slurm.conf to both VMs, by creating a minimal slurm.conf Mac A and Mac B:

# create slurm.conf on ctrl (Mac A)
$ multipass exec ctrl -- bash -lc 'cat | sudo tee /etc/slurm/slurm.conf >/dev/null' << "SLURMCONF"
ClusterName=maclab
SlurmctldHost=ctrl
MpiDefault=pmix
ProctrackType=proctrack/cgroup
ReturnToService=2
SlurmctldTimeout=120
SlurmdTimeout=300
SlurmUser=slurm
StateSaveLocation=/var/spool/slurm
SlurmdSpoolDir=/var/spool/slurm/slurmd
AuthType=auth/munge
SelectType=select/cons_tres
SelectTypeParameters=CR_Core
AccountingStorageType=accounting_storage/none
JobCompType=jobcomp/none
SchedulerType=sched/backfill

NodeName=ctrl  CPUs=4 RealMemory=5800 State=UNKNOWN
NodeName=node1 CPUs=4 RealMemory=5800 State=UNKNOWN
PartitionName=debug Nodes=ctrl,node1 Default=YES MaxTime=02:00 State=UP
SLURMCONF

Slurm can only behave sensibly when slurm.conf is identical on all nodes, so this symmetry is important.

Next we integrate Slurm with the OS on each VM: add the slurm system user, create spool/log dirs, and wire up systemd. On Mac A we run:

multipass exec ctrl -- bash -lc '
  set -e

  # Slurm service user
  sudo useradd -r -s /bin/false slurm || true

  # State + log directories
  sudo mkdir -p /var/spool/slurm/slurmd /var/log/slurm
  sudo chown -R slurm:slurm /var/spool/slurm /var/log/slurm

  # Make sure Munge is on
  sudo systemctl enable munge
  sudo systemctl restart munge

  # Enable Slurm controller + daemon
  sudo systemctl enable slurmctld
  sudo systemctl enable slurmd
  sudo systemctl restart slurmctld
  sudo systemctl restart slurmd
'

On Mac B → configure node1

multipass exec node1 -- bash -lc '
  set -e

  # Slurm service user
  sudo useradd -r -s /bin/false slurm || true

  # State + log directories
  sudo mkdir -p /var/spool/slurm/slurmd /var/log/slurm
  sudo chown -R slurm:slurm /var/spool/slurm /var/log/slurm

  # Munge on
  sudo systemctl enable munge
  sudo systemctl restart munge

  # Slurm node daemon
  sudo systemctl enable slurmd
  sudo systemctl restart slurmd
'

We need to make sure that Slurm cluster integration is enabled on both VMs:

After starting Slurm services, run these tests on ctrl to ensure Slurm sees both nodes:

% multipass exec ctrl -- bash -lc 'sinfo'
% multipass exec ctrl -- bash -lc 'scontrol show nodes'

Both ctrl and node1 should appear as part of debug partition

Subheading

This is a paragraph under a subheading.

Images

Local image (stored in `/img/`)

A beautiful sunrise

Image with alt and title

Alt text describing the image

Code

Inline code looks like this.

Block code (JavaScript)

function greet(name) {
  console.log(`Hello, ${name}!`);
}
greet("Eleventy");

Blockquote

“The simplicity of Markdown paired with Eleventy is unbeatable.” — A developer

Lists

Unordered list

Easy to write
Easy to read
Converts to <ul>

Ordered list

First item
Second item
Third item

Table

Feature	Description	Available
Markdown	Easy formatting	✅
Code blocks	Great for dev-focused content	✅
Images	Embed locally or externally	✅

Horizontal Rule

Thanks for reading! 🎉

You can also explore the next post when it's ready.

Ubuntu 24.04.3 LTS Server

Make sure the system is up-to-date:

$ sudo apt update && sudo apt upgrade -y

Install required Linux packages

These match what your VMs have, so the cluster stays consistent:

$ sudo apt install -y \
    slurm-wlm munge libmunge-dev libmunge2 \
    openmpi-bin libopenmpi-dev \
    python3 python3-pip python3-venv \
    build-essential git cmake pkg-config

This gives you:

Slurm + Munge → node authentication and job scheduling

OpenMPI → MPI backend

Python 3 → runtime for your scripts

git, cmake, pkg-config → build/development utilities

📝 You don’t need brew install python@3.13 — Ubuntu 24.04 already ships Python 3.12 (which is fully compatible with your tutorial code).

Install & Configure Slurm, MPI, and Munge on all cluster nodes

Then set up your Python environment:

$ mkdir -p ~/hpc-llm/{src,data,checkpoints,out}
$ python3 -m venv ~/hpc-llm/.venv
$ source ~/hpc-llm/.venv/bin/activate
$ pip install --upgrade pip
$ pip install mpi4py torch==2.3.1 transformers==4.44.2 datasets==2.20.0 peft==0.11.1 accelerate==0.34.2

Verification steps:

$ which slurmd
/usr/sbin/slurmd

$ which mpirun
/usr/bin/mpirun

$ python3 -m mpi4py
No path specified for execution
usage: python3 -m mpi4py.run [options] <pyfile> [arg] ...
   or: python3 -m mpi4py.run [options] -m <mod> [arg] ...
   or: python3 -m mpi4py.run [options] -c <cmd> [arg] ...
   or: python3 -m mpi4py.run [options] - [arg] ...
Try `python3 -m mpi4py.run -h` for more information.

Those outputs mean that Slurm and MPI are in a healthy position:

Tags: LLM, AI Engineering

A Tutorial for Learning HPC Using 2 Macs for an LLM/NLP application

PART 1—PREPARE MACS

Step 0: Enable SSH Between Your Two Macs

Step 1: Install the tooling (on both Macs)

Step 2: Create the “cluster” VMs

Step 3 — Install Slurm + MPI + Munge inside the VMs

Step 4 — Start Slurm and run a first distributed job

Subheading

Images

Local image (stored in `/img/`)

Image with alt and title

Code

Block code (JavaScript)

Blockquote

Links

External link

Internal link

Lists

Unordered list

Ordered list

Table

Horizontal Rule

Install & Configure Slurm, MPI, and Munge on all cluster nodes

A Tutorial for Learning HPC Using 2 Macs for an LLM/NLP application

PART 1—PREPARE MACS

Step 0: Enable SSH Between Your Two Macs

Step 1: Install the tooling (on both Macs)

Step 2: Create the “cluster” VMs

Step 3 — Install Slurm + MPI + Munge inside the VMs

Step 4 — Start Slurm and run a first distributed job

Subheading

Images

Local image (stored in /img/)

Image with alt and title

Code

Block code (JavaScript)

Blockquote

Links

External link

Internal link

Lists

Unordered list

Ordered list

Table

Horizontal Rule

Install & Configure Slurm, MPI, and Munge on all cluster nodes

Local image (stored in `/img/`)