How to select a bunch of optimized data from a larger data set?

Question

How to select a bunch of optimized data from a larger data set? I have a data set containing the products to be sold to the customers. And the products were sold before, it has some information, like the review, ratings, how many were sold per product from the customers. Now, I want to select a bunch of optimized data from a larger data set based on the number of units sold, ratings, review etc all the information. The purpose is to increase the number of units to be sold in the future. How to do this? What kind of model, method, algorithms should be used here? If it is related to machine learning, what kind of machine learning tools should I use here? Thank you so much.

You can definitely do sales prediction and analytics. But, what do you really mean by: select a bunch of optimized data from a larger data set? — Dawny33, Nov 23 '15 at 16:51
I have some products, each product have their own selling information, like # of units sold before, review, ratings, etc, want to select a subset from the data to recommend to the retailers, so to increase the # of units being sold for the retailers. I want the subset to be like "optimized" subset, so that it can increase the retailers' # of units sold as much as possible. — user3634601, Nov 23 '15 at 16:57
Unless you have a compelling reason why, generally you do not want to subset data. Rather, you will train on all data. I think the optimization you are referring to is the ability to provide an optimal recommendation. For this you could look into collaborative filtering techniques. — , Nov 23 '15 at 19:24
Thank you so much. For the collaborative filtering, there is a project about it: http://blog.yhathq.com/posts/recommender-system-in-r.html. I took a look, but when you do it, you need a reference data point for referring. Here we do not have a reference data point. So I am wondering how. Thank you. — user3634601, Nov 23 '15 at 21:17
Nice blog btw... By reference data point do you mean that you do not have a similarity measure for users in your system? You will create one, e.g. in your case it might be finding the Jaccard similarity of users based on their purchases. See http://datascience.stackexchange.com/questions/8873/user-product-positive-click-data-available-how-to-generate-negative-no-click/8894#8894 for links on collaborative filtering and creation of a utility matrix. — , Nov 23 '15 at 21:31
reference data point is that something like an input, a target data point in the system. like in the beer system, we try to recommend beer to the user, then, we need to get some beer information about the user, the metrics of the beer the user like, then based on the metric, calculate and recommend beer to the user, here, we do not have a certain user, or the products' information they like. we just machine learning from a bigger data set, and the data set has the selling data. — user3634601, Nov 23 '15 at 23:00

How to select a bunch of optimized data from a larger data set?

0 Answers0