0

Does anyone know how I may pass by reference my pandas Dataframe (Mix of Strings, Floats & Ints, Image of example below), I can convert to an N Dim Array in Numpy if necessary, however I can't mix types else I believe I can't access by reference. Compute efficiency and speed is why Im now using C++ in pybind11.

Is it even logical to attempt to access a pandas Dataframe, or would it be better to just have an Ndim array in Numpy if the data will be located contiguously in memory. E.g. whats the best practice when storing table Like data for efficiency purposes?

I believe it is only possible to pass by reference as an N Dimensional Numpy array which Pybind11 can access the memory of in place.

I had the idea that perhaps I would need to convert the N Dimensional array into a MAP of 1D vectors e.g.

map<string, map<string, vector<double>>> map_vectors;

However I dont think this is efficient, and I am unable to have a vector of both strings, Ints & Doubles, else I can't access these as references.

Formatted Pandas Dataframe: enter image description here

I will be very grateful for any tips or advice how I can try to emulate or access the Pandas Dataframe of Numpy array in place, as a reference.

enter image description here Sample C++ PYBIND11 Code for accessing the N Dim Numpy Array as a reference (purely for numpy array of doubles)

#include <vector>
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include <pybind11/numpy.h>
#include <chrono>
#include <thread>

namespace py = pybind11;
void array_wrapper(py::array_t<double>& in_results) {

    if (in_results.ndim() != 2) {
        throw std::runtime_error("Results should be a 2-D Numpy array");
    }

    py::buffer_info buf = in_results.request();

    double* ptr = (double*)buf.ptr;
    int array_max_size = in_results.shape()[1];

    size_t pos = 0;

    for (size_t i = 0; i < 2; i++) {

        for (size_t j = 0; j < array_max_size; j++) {
            if (ptr[pos] > 100 && ptr[pos] < 150) {
                ptr[pos] = ptr[pos] * 2;
            }
            else if (ptr[pos] < 100)
            {
                ptr[pos] *= 3;
            }
            else {
                ptr[pos] = 5.55;
            }
            pos++;
        }
    }
}


PYBIND11_MODULE(NDF_CPP_module, module_handle) {
  module_handle.doc() = "I'm a docstring hehe";
  
}


Pandas Dataframe in Python

ccy = 'KRW'

temp_df = pd.DataFrame([{"CCY":ccy, "TENOR":"Tom Fix","BID": 1, "ASK": 5,"MID":7, "SPREAD": 0.8, "SKEW": 0,"BID_override":"","ASK_override": "", "BID_output": 5,"ASK_output": 6, "MID_output": 8, "POINTS_SCALE":0,"BB MID": 8, "CHK": 8, "DAYS":0},
                    {"CCY":ccy,"TENOR":"1W1M","BID": 1, "ASK": 1,"MID":1, "SPREAD": 0.07, "SKEW": 0,"BID_override":"","ASK_override": "", "BID_output": 5,"ASK_output": 6, "MID_output": 8, "POINTS_SCALE":0,"BB MID": 8, "CHK": 8, "DAYS":0},
                    {"CCY":ccy,"TENOR":"1x2","BID": 1, "ASK": 1,"MID":1, "SPREAD": 0.07, "SKEW": 0,"BID_override":"","ASK_override": "", "BID_output": 5,"ASK_output": 6, "MID_output": 8, "POINTS_SCALE":0,"BB MID": 8, "CHK": 8, "DAYS":0},
                    {"CCY":ccy,"TENOR":"1x3","BID": 1, "ASK": 1,"MID":1, "SPREAD": 0.07, "SKEW": 0,"BID_override":"","ASK_override": "", "BID_output": 5,"ASK_output": 6, "MID_output": 8, "POINTS_SCALE":0,"BB MID": 8, "CHK": 8, "DAYS":0},
                    {"CCY":ccy,"TENOR":"1x6","BID": 1, "ASK": 1,"MID":1, "SPREAD": 0.07, "SKEW": 0,"BID_override":"","ASK_override": "", "BID_output": 5,"ASK_output": 6, "MID_output": 8, "POINTS_SCALE":0,"BB MID": 8, "CHK": 8, "DAYS":0},
                    {"CCY":ccy,"TENOR":"1x9","BID": 1, "ASK": 1,"MID":1, "SPREAD": 0.1, "SKEW": 0,"BID_override":"","ASK_override": "", "BID_output": 5,"ASK_output": 6, "MID_output": 8, "POINTS_SCALE":0,"BB MID": 8, "CHK": 8, "DAYS":0},
                    {"CCY":ccy,"TENOR":"1x12","BID": 1, "ASK": 1,"MID":1, "SPREAD": 0.1, "SKEW": 0,"BID_override":"","ASK_override": "", "BID_output": 5,"ASK_output": 6, "MID_output": 8, "POINTS_SCALE":0,"BB MID": 8, "CHK": 8, "DAYS":0}])

temp_df
Py_trader
  • 39
  • 5
  • Take a look at https://stackoverflow.com/questions/13187778/convert-pandas-dataframe-to-numpy-array. Like you wrote, Pybind11 supports numpy arrays. – unddoch Apr 23 '22 at 18:14

0 Answers0