2

I am writing an implementation of a neural network in Rust and am trying to calculate the dot product of two matrices. I have the following code:

fn dot_product(a: Vec<f64>, b: Vec<f64>) -> f64 {
    // Calculate the dot product of two vectors.
    asserteq!(a.len(), b.len());
    let mut product: f64;
    for i in 0..a.len() {
        product += a[i] * b[i];
    }
    product
}

This takes two vectors, a and b (of the same length) and carries out element-wise multiplication (multiplying value 1 of vector a with value 1 of vector b and adding this to value 2 of vector a and with value 2 of vector b and so on...).

Is there a more efficient way of doing this, and if so, how?

Shepmaster
  • 326,504
  • 69
  • 892
  • 1,159
Teymour Aldridge
  • 1,546
  • 11
  • 25
  • Looks fine to me. I'd use it until you're confident it's a bottleneck. If you're already sure you need the fastest possible, maybe look into [SIMD](https://rust-lang-nursery.github.io/edition-guide/rust-2018/simd-for-faster-computing.html)? – anderspitman Jan 03 '19 at 19:46
  • 2
    You can do it in one line with iterators, like `a.into_iter().zip(b).map(|(a, b)| a*b).sum()`. But I'd expect that to be comparably fast, not significantly faster (or slower). – trent Jan 03 '19 at 20:41

2 Answers2

3

This isn't meant as a comprehensive general answer, but I wanted to share a little code.

Your implementation looks pretty much like what I would do unless I knew it was the bottleneck in my application. Then I'd look into more esoteric approaches (maybe SIMD).

That said, you might consider changing your function to take slice references instead. That way you can pass Vecs or arrays:

fn dot_product(a: &[f64], b: &[f64]) -> f64 {
    // Calculate the dot product of two vectors. 
    assert_eq!(a.len(), b.len()); 
    let mut product = 0.0;
    for i in 0..a.len() {
        product += a[i] * b[i];
    }
    product
}

fn main() {
    println!("{}", dot_product(&[1.0,2.0], &[3.0,4.0]));
    println!("{}", dot_product(&vec![1.0,2.0], &vec![3.0,4.0]));
}

See also:

Shepmaster
  • 326,504
  • 69
  • 892
  • 1,159
anderspitman
  • 7,690
  • 9
  • 38
  • 53
1

I used rayon and packed_simd to calculate the dot product and found a way to be faster than Intel MKL:

extern crate packed_simd;
extern crate rayon;
extern crate time;

use packed_simd::f64x4;
use packed_simd::f64x8;
use rayon::prelude::*;
use std::vec::Vec;

fn main() {
    let n = 100000000;
    let x: Vec<f64> = vec![0.2; n];
    let y: Vec<f64> = vec![0.1; n];

    let res: f64 = x
        .par_chunks(8)
        .map(f64x8::from_slice_unaligned)
        .zip(y.par_chunks(8).map(f64x8::from_slice_unaligned))
        .map(|(a, b)| a * b)
        .sum::<f64x8>()
        .sum();
    println!("res: {}", res);
}

This code in my Github. I hope this helps.

Shepmaster
  • 326,504
  • 69
  • 892
  • 1,159
  • It's not needed to have `use std::vec::Vec;`; it's part of the prelude. Why is the time crate here? Can you compare this to the performance of the OPs code and the other answer? Why the choice of `unaligned`? – Shepmaster May 20 '19 at 16:29