3

I am trying to use Support Vector Regression on a (neurophysiological) dataset where the position of points on a circular manifold in N dimensions is correlated with a circular variable (phase of an oscillation of a separate system).

Before starting, I am trying to solve this on a dummy dataset. I have dots from a circle $X$, plus some noise, and each is associated with a circular variable $y$ which is the angle of the dots, plus some noise (this recapitulates reasonably well my dataset, which has just more dimensions and a less regular circular manifold - just stating this in case there might be better approaches than what I am describing below).

from matplotlib import pyplot as plt
import numpy as np
from sklearn import svm
from sklearn.model_selection import train_test_split

n_points = 1000 rad_sd, phase_sd = 0.2, 0.5

angles = np.random.rand(n_points) * 2np.pi X = np.array([np.cos(angles)+np.random.randn(n_points)rad_sd, np.sin(angles)+np.random.randn(n_points)rad_sd]).T y = angles + np.random.randn(n_points) phase_sd

f, ax = plt.subplots(figsize=(3, 2.5)) im = plt.scatter(X[:, 0], X[:, 1], c=y) ax.set(xlabel="$x_1$", ylabel="$x_2$") plt.colorbar(im, label="phase $y$") plt.tight_layout()

enter image description here

I am then fitting the vanilla SVR to the data:

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.5, random_state=42)

regr = svm.SVR() regr.fit(X_train, y_train) predicted = regr.predict(X_test)

f, ax = plt.subplots(figsize=(4, 3)) ax.scatter(y_test, predicted, lw=0) ax.set(ylabel="Predicted $y$", xlabel="Test $y$")

enter image description here

Which looks very good, apart from having a problematic transition from $0$ to $2\pi$.

I guess that what I would need it to somehow make the optimiser know that this is a circular variable, and that distances for the fit should be calculated accordingly. It is not clear to me however how could I add a step to the pipeline/modify the loss function to get that distance calculated. I get the feeling that any kind of optimisation I try in scikit-learn would need to deal with this issue

Any clue? Thank you very much in advance for any pointer that could help me with this problem!

vigji
  • 73

1 Answers1

2

Found a solution thanks to other posts here. It is possible to fit to more than one $y$ the model; so I regressed $\sin(y)$ and $\cos(y)$ in a MultiOutputRegressor model and took the $\arctan2$ of the solutions:

from sklearn.multioutput import MultiOutputRegressor

y_train_multi = np.array([np.sin(y_train), np.cos(y_train)]).T y_test_multi = np.array([np.sin(y_test), np.cos(y_test)]).T

regressor = MultiOutputRegressor(SVR(kernel='rbf', C=1e3, gamma=0.1))

regressor.fit(X_train, y_train_multi) prediction = regressor.predict(X_test)

angle_predicted = np.arctan2(prediction[:, 0], prediction[:, 1])

f, ax = plt.subplots(figsize=(4, 3)) ax.scatter(y_test, angle_predicted, lw=0) ax.set(ylabel="Predicted $y$", xlabel="Test $y$")

enter image description here

User1865345
  • 8,202
vigji
  • 73