M1-GPU-Compute

Using Swift and Apple’s Metal API to utilize the GPUs on M1 equipped Macs

Adding Two Arrays

CPUs perform computations sequentially, waiting for the previous computation to finish before moving onto the next one. Let’s say we wanted to add two arrays together:

let array1 = [3, 2, 4, 1, 5]
let array2 = [1, 5, 2, 7, 3]

The CPU would go in order, adding each index from left to right. An example of CPU implementation:

let result = [0, 0, 0, 0, 0]

for i in 0..<5 {
    result[i] = array1[i] + array2[i]
}

CPUs would add 3 and 1 then place 4 into the result array, then move on to the next index:

3 + 1 = 4 2 + 5 = 7 4 + 2 = 6 1 + 7 = 8 5 + 3 = 8

While this is very easy for the CPU to do and will happen in 0.01 of a millisecond, it will get really slow, really fast as the array grows to millions or even billions in length.

This is where a GPU comes in really handy. GPUs are great at doing repetative, simple tasks, like adding two numbers, but instead of doing them one at a time, a GPU splits the task and computes the entire resulting array in one go.

This is what the GPU function inside compute.metal looks like:

kernel void addition_compute_function(constant float *arr1        [[ buffer(0) ]],
                                      constant float *arr2        [[ buffer(1) ]],
                                      device   float *resultArray [[ buffer(2) ]],
                                      uint  index        [[ thread_position_in_grid ]]) {
    resultArray[index] = arr1[index] + arr2[index];
}

It takes in the two arrays and resulting array as parameters and also uint index [[ thread_position_in_grid ]] which is a thread specifcially assigned to perform the addition on index of the array

Results

My specs: M1 Max 10-core CPU, 24-core GPU, 32GB RAM

Array Size CPU Time (seconds) GPU Time (seconds)
5 0.00001 0.00070
100,000 0.03356 0.00707
1,000,000 0.33692 0.00990
50,000,000 16.86406 0.07883
100,000,000 33.44142 0.15101
500,000,000 N/A 0.80057
1,000,000,000 N/A 1.41739
1,700,000,000 N/A 24.19244

Note: Adding two 1.7 billion length arrays uses all 32GB of memory

This time is only for the adding portion of the function. The creation of arrays and populating them with random values also takes a considerable portion of time, and can also be accelerated by the GPU

GitHub

View Github