Let me start by say this, I am an intern at a company with less then a year of experience. Therefore what I ask or write might be slightly naive. I don't even know if this is the right forum to ask this question.
Problem Breakdown: Currently using Contec DIO-3232T-PE Board to output/input data from Windows PC to an external device. The external device, lets call TYC, receives the data does some processing and sends data back when requested. However the time taken to set 1,8,16 or 32 pins takes 2.2us. All the code is written with Microsoft Studio and a given function from the Contec Driver is called to write the DIO pins. My job is to speed up communication between the Windows PC and TYC.
What I have tried: I tried optimizing the code and unfortunately it still takes 2.2us to set the pins. I switched out the Contec Board for another similar board and got results of 1.6us but that is no where near the target of 0.2us. I decided to experiment with a Raspberry Pi and was able to toggle pins back and forth around 0.02us or 20ns, so it is possible to toggle pins at the target speed. I do realize though that RPI is closer to the pins and therefore does not have to go through all the protocols that windows does. I have also tried looking online for FPGA PCIe Boards but there are so many and I am unsure as which one to buy. Also very few datasheets list the io driving speed.
Therefore I need a recommendation on what to do or a new board. The ideal board has at least 28 output "pins" and 8 input "pins". To set "pins" takes around 0.2us, that is from calling the function to actually seeing the signal on an oscilloscope. And ideally the cost is around 500 CAD, but could be more if needed.