I have collected a dataset on a group of patients with a rare disease where not much is known. These patients can have an outcome, X, which has been seen in 50 of the 300 patients. I want to find out the risk factors associated with getting X in this rare disease. Since barely anything is known about this disease I barely have any hypotheses about which risk factors might have an influence except for the common stuff like smoking. I have over 30 variables collected (possible risk factors). I understand that you can only analyze one risk factor for every 10 events but I really only want to find possible risk factors so I can help these patients, just a clue for possible risk factors would help.
To minimize the amount of variables I input into the multivariable analysis I was thinking of doing something I've seen in several other studies where they do a univariable analysis for each risk factor, then they grab those with a p value of below 0.20 (where does that number come from?) and enter those into the multivariable analysis. In the table the results from both analyses are shown. I've seen this in reputable journals such as NEJM. What are some caveats of doing it this way? Or could I do it in another way? Any help is greatly appreciated.