How to generate data for test?

Question

I am working on the following problem:

I have a scheduling problem with time-varying capacities and other characteristics that are not treated in classic Job Shop or Flow Shop.

How to generate such data in order to test algorithms ? I am not looking to solve the problem.

P.S: I took a look in literature for data sets from job shop, rcpsp and flow shop problems. I couldn't find all the aspects I wanted.

Welcome to OR.SE. As you mentioned each task has a precedence relation and it can be performed by a set of specific machines, it sounds like a hybrid scheduling problem. You can model it by mixed-integer programming and solve it by a standard MIP solver. In this case, I think, using simulation might be a bit complicated. — A.Omidi, Nov 13 '20 at 19:25
@A.Omidi I am not looking to solve the problem but to generate data. — Approximations, Nov 13 '20 at 21:13
Are you modeling an actual facility (for example, a real-world paper mill), or at least an actual type of facility (for example, paper mills similar to a specific real one)? — prubin, Nov 13 '20 at 21:53
@prubin No, the problem is generic, it should be roughly modeling a lot of facilities. I took a look in literature for data sets from job shop, rcpsp and flow shop problems. I couldn't find all the aspects I wanted, either there is no due dates, or no time-varying availabilities etc. — Approximations, Nov 13 '20 at 22:00
I am curious, how the question of modeling or not an actualspecefic facility could make a difference. — Approximations, Nov 13 '20 at 22:06
@Approximations, I answered your doubt to use simulation or other methods like MIP. If you are interested to generate data and it is based on your specific problem, one possible way is to use some references like this and adding other needs which may not be in the references. For example, if you have a release date/due date for each job you will need to change some of the problem constraints/objective function based on JIT concepts and so on. — A.Omidi, Nov 14 '20 at 08:52
@A.Omidi what about introducing non availabilities on machines? — Approximations, Nov 14 '20 at 18:01
@Approximations, If you mean is something like machine eligibility constraint, one possible way to do that is using a specific set/binary parameter and filter corresponding constraints based on. Also, the nice answer from Prof. Rubin would be interesting. — A.Omidi, Nov 15 '20 at 09:04

score 3 · Answer 1 · answered Nov 14 '20 at 21:43

The first step is to ascertain "reasonable" configurations of the facility being modeled. Configuration includes physical characteristics (types of machines, number of machines of each type, typical availability v. down time, etc.) as well as information about the demand patterns (types of jobs, arrival rates of jobs, typical due dates, etc.). For random quantities, such as job arrivals or machine failures, you would look for an appropriate distribution (e.g., exponential interarrival times for jobs) and "average" or "typical" parameters (mean arrival rate, mean time between arrivals, ...). If you were modeling an existing (real-world) facility, you would just use the configuration data from that facility and best estimates of the demand data.

Since you are not modeling an actual facility, you could try looking for literature in which similar facilities are modeled, finding ones that are comparable, and parameterizing yours similarly to what has been published. Note that the published problems do not need to be identical. So, for instance, if your problem includes machine breakdowns and repairs and you find a published instance A with most of your features but not breakdowns, and another instance B that is somewhat similar to your problem and includes breakdowns but not some other thing you have, you could try mixing the breakdown rates from B with the other characteristics of A.

An alternative approach is to build a discrete simulation model for a facility of the type you are modeling. You would pick "dimensions" (number of machine types, number of machines etc.) that you think are reasonable, assign plausible distributions to random quantities (time between job arrivals, time between machine breakdowns, ...), then tune parameters such as mean interarrival time or mean time between breakdowns by running the simulation with different parameter combinations until you get a few combinations that result in reasonable system behavior (the facility is not starved for work, the queue does not explode, ...). That will give you some specific facility profiles you can use for testing your scheduling model/algorithm.

If your goal is to publish results, you need to be careful that your test problems look "reasonable" to a reviewer and do not look as if they were cherry-picked to produce good results with your model/algorithm. That's why basing your problems on a real-world facility is most desirable, and basing them on published instances for comparable models is second most desirable.

How to generate data for test?

1 Answers1

Linked