Camera setup: We have setup some stereo cameras in a living apartment. That is, the indoor environment is monitored. With the stereo cameras, wide lens (3.5mm) are used to cover a big volume.
The height from the floor is around 2.8 meters. The objects (such as mug, bottle, telephone) are at least 3 meters away. For instance, an object (75mmx102mm), which located 3 meters away from the camera, is represented by 15x20px in the camera image. Thus, the images are getting smaller in the far field.
I have around 30 different objects to be recognized.
I do not use the depth information, because it is not so accurate. I just use RGB values from a single camera. Image resolution is 1360 x 1024 pixels.
Approaches: 1. Point detectors/descriptors, matches (some models in the database, and check the match one by one) 2. Bag of visual worlds + SVM classification (5 object categories)
I had experiences with Haar-cascade but I never tried for my current issue.
What methods/approaches should I try to investigate?
Thank you in advance,