17

Partial Least Squares (PLS) algorithm is implemented in the scikit-learn library, as documented here: http://scikit-learn.org/0.12/auto_examples/plot_pls.html In the case where y is a binary vector, a variant of this algorithm is being used, the Partial least squares Discriminant Analysis (PLS-DA) algorithm. Does the PLSRegression module in sklearn.pls implements also this binary case? If not, where can I find a python implementation for it? In my binary case, I'm trying to use the PLSRegression:

pls = PLSRegression(n_components=10)
pls.fit(x, y)
x_r, y_r = pls.transform(x, y, copy=True)

In the transform function, the code gets exception in this line:

y_scores = np.dot(Yc, self.y_rotations_)

The error message is "ValueError: matrices are not aligned". Yc is the normalized y vector, and self.y_rotations_ = [1.]. In the fit function, self.y_rotations_ = np.ones(1) if the original y is a univariate vector (y.shape1=1).

Noam Peled
  • 4,186
  • 3
  • 41
  • 48
  • 3
    Did you ever resolve this? I have tried the same method (using the latest version of scikit-learn) and it seems to do PLS-DA perfectly. The key is to label classes with 1 and 0 (for same/other class). If you still can't get it to work, can you post your data? – mfitzp Oct 04 '13 at 10:54
  • Haven't resolved it yet, but I'll try user3178149 solution. Thanks for offering your help! – Noam Peled Feb 10 '14 at 07:01
  • @mfitzp Is partial least squares regression the same as partial least squares discriminant analysis? I am trying to figure out how to get plots from the first two components. – O.rka Jul 29 '16 at 20:07
  • 1
    @O.rka correct, PLSDA for two groups is just PLS Regression against a binary variable (0 or 1) representing group membership. See [here](http://mfitzp.io/article/partial-least-squares-discriminant-analysis-plsda/) for a longer write up. – mfitzp Jul 29 '16 at 20:29
  • 1
    Thanks for that. I've just recently gotten introduced to ordination and I want to understand it before I start implementing it. Wow. AMAZING tutorial – O.rka Jul 29 '16 at 20:35

3 Answers3

28

PLS-DA is really a "trick" to use PLS for categorical outcomes instead of the usual continuous vector/matrix. The trick consists of creating a dummy identity matrix of zeros/ones which represents membership to each of the categories. So if you have a binary outcome to be predicted (i.e. male/female , yes/no, etc) your dummy matrix will have TWO columns representing the membership to either category.

For example, consider the outcome gender for four people: 2 males and 2 females. The dummy matrix should be coded as :

import numpy as np
dummy=np.array([[1,1,0,0],[0,0,1,1]]).T

, where each column represents the membership to the two categories (male, female)

Then your model for data in variable Xdata ( shape 4 rows,arbitrary columns ) would be:

myplsda=PLSRegression().fit(X=Xdata,Y=dummy)

The predicted categories can be extracted from comparison of the two indicator variables in mypred:

mypred= myplsda.predict(Xdata)

For each row/case the predicted gender is that with the highest predicted membership.

markcelo
  • 476
  • 5
  • 6
  • 3
    If your data contains only two classes, it is better to present y as a single column then do regression, and identify the class using threshold of half value of the two class value, for example, if 1 is for class one and -1 for the other class, threshold is 0. There exist problems if a matrix of y is used. This is also why PLSDA is not recommended for multiclass problem. See paper _Partial least squares discriminant analysis: taking the magic away_ for detail discussion. – Elkan Jan 23 '17 at 05:44
4

You can use the Linear Discriminate Analysis package in SKLearn, it will take integers for the y value:

LDA-SKLearn

Here is a short tutorial on how to use the LDA: sklearn LDA tutorial

Kyle54
  • 123
  • 7
-1

Not exactly what you are looking for, but you might want to check these two threads about how to call to a native (c/c++ code) from a python and a c++ PLS libs implementation:

Partial Least Squares Library

Calling C/C++ from Python?

you can use boost.python to embed the c++ code into python. Here is an example taken from the official site:

Following C/C++ tradition, let's start with the "hello, world". A C++ Function:

char const* greet()
{
   return "hello, world";
}

can be exposed to Python by writing a Boost.Python wrapper:

#include <boost/python.hpp>

BOOST_PYTHON_MODULE(hello_ext)
{
    using namespace boost::python;
    def("greet", greet);
}

That's it. We're done. We can now build this as a shared library. The resulting DLL is now visible to Python. Here's a sample Python session:

>>> import hello_ext
>>> print hello_ext.greet()
hello, world
0x90
  • 37,093
  • 35
  • 149
  • 233