I have this table (generated from an online latex table generator):
\begin{table}[]
\centering
\caption{My caption}
\label{my-label}
\begin{tabular}{|l|l|l|}
\hline
Technique & Possible Advantages & Possible Disadvantages \\ \hline
ANN & Excellent overall calibration error \cite{tollenaar_which_2013}high prediction accuracy \cite{mair_investigation_2000\}, \cite{tollenaar_which_2013}, \cite{percy_predicting_2016} & Neural nets continuously reuse and perform combinations of the input variable through multiple analytical layers, which could make the learning process slow at times \cite{hardesty_explained:_2017}can get very complicated very quickly, making it slightly hard to interpret\cite{percy_predicting_2016} \\ \hline
KMeans & Clustering provides the functionality to discover and analyse any groups that have formed organically rather than defining the groups before looking at the data \cite{trevino_introduction_2016} & due to its high sensitivity to the starting points of the clustering centres, several runs would be indispensable to procure an optimal solution \cite{likas_global_2003} \\ \hline
KNN & simplistic implementation.KNNs are considered to be very flexible and adaptable due to its non-parametric property (no assumptions made on the underlying distribution of the data) \cite{noauthor_k-nearest_2017}KNN is also an instance-based, lazy learning algorithm meaning that it does not generalise using the training data \cite{larose_knearest_2014} & this algorithm is more computationally expensive than traditional models (logistic regression and linear regression) \cite{henley_k-nearest-neighbour_1996} \\ \hline
RF & efficient execution on large data sets \cite{breiman_random_2001}handling numerous input variables without deletion \cite{breiman_random_2001}balancing the error in class populations \cite{breiman_random_2001}random forests do not overfit data because of the law of Large Numbers \cite{breiman_random_2001}Very good for variable importance (since this algorithm gives every variable the chance to appear in different contexts with different covariates) \cite{strobl_introduction_2009} & Possible overfitting concern \cite{segal_machine_2003}, \cite{philander_identifying_2014}, \cite{luellen_propensity_2005}complicated to interpret because there is no organisational manner by which the single trees disperse inside the forest, i.e. there is no nesting structure whatsoever - since every predictor may appear in different positions, or even trees \cite{strobl_introduction_2009} \\ \hline
DT & very computationally efficient, flexible, and also intuitively simple to implement \cite{friedl_decision_1997}robust and insensitive to noise \cite{friedl_decision_1997}simple to interpret and visualise by using simple data analytical techniques \cite{friedl_decision_1997} & can be readily susceptible to overfitting \cite{gupta_decision_2017}sensitive to variance \cite{gupta_decision_2017} \\ \hline
ERT & computationally quicker than random forest with similar performance \cite{geurts_extremely_2006} & if the dataset contains a high number of noisy features, which was noted by the authors to have negatively affected the algorithm's overall performance \cite{geurts_extremely_2006} \\ \hline
RGF & does not require the number of trees to build a hyper-parameter due to automatically calculating it as a result of the loss function minimisation \cite{noauthor_introductory_2018}Excellent prediction accuracy \cite{johnson_learning_2014} & slower training time \cite{johnson_learning_2014} \\ \hline
SVM & Based on the concept of determining the best hyperplane that splits the given dataset into two partitions makes it especially fitting for classification problems \cite{noel_bambrick_support_2016}efficiently deal with datasets containing fewer samples \cite{guyon_gene_2002} & tend to reduce efficiency significantly with noiser data \cite{noel_bambrick_support_2016}highly computationally expensive, resulting in slow training speeds \cite{noauthor_understanding_2017}Selecting the right kernel hyper-parameter plays a vital role in tuning this model and can also be considered as a setback of this model, as also noted \cite{fradkin_dimacs_nodate}, \cite{burges_tutorial_1998} \\ \hline
LOGREG & fitting in cases where the predictor is dichotomous (can be split into two clusters, i.e., binary) \cite{statistics_solutions_what_2017}accessible development \cite{rouzier_direct_2009} & overfitting - especially when the amount of parameter values increases too much - which in turn makes the algorithm highly inefficient \cite{philander_identifying_2014\ \\ \hline
BAGGING & equalises the impact of sharp observations which improves performance in the case of weak points \cite{grandvalet_bagging_2004} & equalises the impact of sharp observations which harms performance in the case of strong points \cite{grandvalet_bagging_2004} \\ \hline
ADABOOST & performs well and quite fast \cite{freund_short_1999}pretty simple to implement - especially since it requires no tuning parameters to work (only the number of iterations) \cite{freund_short_1999}can be dynamically cohered with every base learning algorithm since it does not require any prior understanding of the weak points \cite{freund_short_1999} & initial weak point weighting was slightly better than random, then an exponential drop in the training error was observed \cite{freund_short_1999} \\ \hline
XGB & sparsity-aware operation \cite{analytics_vidhya_which_2017}offers a constructive cache-aware architecture for 'out-of-core' tree generation \cite{analytics_vidhya_which_2017}can also detect non-linear relations in datasets that contain missing values \cite{chen_xgboost:_2016} & Slower execution speed than LightGBM \cite{noauthor_lightgbm:_2018} \\ \hline
LGB & fast and highly accurate performances \cite{analytics_vidhya_which_2017} & higher loss function value \cite{wang_lightgbm:_2017} \\ \hline
ELM & simple and efficient \cite{huang_extreme_2006}rapid learning process \cite{huang_extreme_2011}solves straightforwardly \cite{huang_extreme_2006} & No generalisation performance improvement (or slight improvement) \cite{huang_extreme_2006}, \cite{huang_extreme_2011}, \cite{huang_real-time_2006}preventing overfitting would require adaptation as the algorithm learns \cite{huang_extreme_2006}lack of deep-learning functionality (only one level of abstraction) \\ \hline
LDA & Strong assumptions with equal covariances \cite{yan_comparison_2011}Lower computational cost compared to similar algorithms \cite{fisher_use_1936}, \cite{li_2d-lda:_2005}Mathematically robust \cite{fisher_use_1936} & Assumptions are sometimes disrupted to produce good results \cite{yan_comparison_2011}. \& Image Classification \cite{li_2d-lda:_2005}LD function sometimes results less then 0 or more than 1 \cite{yan_comparison_2011} \\ \hline
LR & Simple to implement/understand \cite{noauthor_learn_2017}Can be used to determine the relationship between features \cite{noauthor_learn_2017}Optimal when relationships are linear.Able to determine the cost of the influence of the variables \cite{noauthor_advantages_nodate} & Prone to overfitting \cite{noauthor_disadvantages_nodate-1}, \cite{noauthor_learn_2017}Very sensitive to outliers \cite{noauthor_learn_2017}Limited to linear relationships \cite{noauthor_disadvantages_nodate-1} \\ \hline
TS & Analytics of confidence intervals \cite{fernandes_parametric_2005}Robust to outliers \cite{fernandes_parametric_2005}Very efficient when error distribution is discontinuous (distinct classes) \cite{peng_consistency_2008} & Computationally complex \cite{plot.ly_theil-sen_2015}Loses some mathematical properties by working on random subsets \cite{plot.ly_theil-sen_2015}When a heteroscedastic error, biasedness is an issue \cite{wilcox_simulations_1998} \\ \hline
RIDGE & Prevents overfitting \cite{noauthor_complete_2016}Performs well (even with highly correlated variables) \cite{noauthor_complete_2016}Co-efficient shrinkage (reduces the model's complexity) \cite{noauthor_complete_2016} & Does not remove irrelevant features, but only minimises them \cite{chakon_practical_2017} \\ \hline
NB & Simple and highly scalable \cite{hand_idiots_2001}Performs well (even with strong dependencies) \cite{zhang_optimality_2004} & Can be biased \cite{hand_idiots_2001}Cannot learn relationships between features (assumes feature independence) \cite{hand_idiots_2001}Low precision and sensitivity with smaller datasets \cite{g._easterling_point_1973} \\ \hline
SGD & Can be used as an efficient optimisation algorithm \cite{noauthor_overview_2016}Versatile and simple \cite{bottou_stochastic_2012}Efficient at solving large-scale tasks \cite{zhang_solving_2004} & Slow convergence rate \cite{schoenauer-sebag\_stochastic\_2017\}Tuning the learning rate can be tedious and is very important \cite{vryniotis_tuning_2013}Sensitive to feature scaling \cite{noauthor_disadvantages_nodate}Requires multiple hyper-parameters \cite{noauthor_disadvantages_nodate} \\ \hline
\end{tabular}
\end{table}
I have a two column paper, and I want it to be elegant. I am trying to implement this table as a supertabular, however, the table is not fitted correctly into the page layout, and the text is also illegible.
I am using this document layout:
\documentclass[a4paper, 10pt, conference]{ieeeconf}
Any ideas?
EDIT
Document structure:
I have 6 sections (section levels). I want the table to be in section 1. I have some text to be placed before the table in section 1.





\cite{freund\_short\_1999\}instead of\cite{freund_short_1999}-- that prevent it from being compilable. Please edit your code to fix these errors. Please also tell us which paper size you employ, which document class is in use, and how wide the margins are. – Mico May 26 '18 at 20:54ieeeconforIEEEconf? (LaTeX is case-sensitive.) – Mico May 27 '18 at 10:53