Transforming a table to a supertabular

Question

I have this table (generated from an online latex table generator):

\begin{table}[]
\centering
\caption{My caption}
\label{my-label}
\begin{tabular}{|l|l|l|}
\hline
Technique & Possible Advantages                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 & Possible Disadvantages                                                                                                                                                                                                                                                                                                                                                                                                                                                                         \\ \hline
ANN       & Excellent overall calibration error \cite{tollenaar_which_2013}high prediction accuracy \cite{mair_investigation_2000\}, \cite{tollenaar_which_2013}, \cite{percy_predicting_2016}                                                                                                                                                                                                                                                                                                                                       & Neural nets continuously reuse and perform combinations of the input variable through multiple analytical layers, which could make the learning process slow at times \cite{hardesty_explained:_2017}can get very complicated very quickly, making it slightly hard to interpret\cite{percy_predicting_2016}                                                                                                                                             \\ \hline
KMeans    & Clustering provides the functionality to discover and analyse any groups that have formed organically rather than defining the groups before looking at the data \cite{trevino_introduction_2016}                                                                                                                                                                                                                                                                                                                                                                                & due to its high sensitivity to the starting points of the clustering centres, several runs would be indispensable to procure an optimal solution \cite{likas_global_2003}                                                                                                                                                                                                                                                                                                   \\ \hline
KNN       & simplistic implementation.KNNs are considered to be very flexible and adaptable due to its non-parametric property (no assumptions made on the underlying distribution of the data) \cite{noauthor_k-nearest_2017}KNN is also an instance-based, lazy learning algorithm meaning that it does not generalise using the training data \cite{larose_knearest_2014}                                                                                                                                                                                              & this algorithm is more computationally expensive than traditional models (logistic regression and linear regression) \cite{henley_k-nearest-neighbour_1996}                                                                                                                                                                                                                                                                                                                 \\ \hline
RF        & efficient execution on large data sets \cite{breiman_random_2001}handling numerous input variables without deletion \cite{breiman_random_2001}balancing the error in class populations \cite{breiman_random_2001}random forests do not overfit data because of the law of Large Numbers \cite{breiman_random_2001}Very good for variable importance (since this algorithm gives every variable the chance to appear in different contexts with different covariates) \cite{strobl_introduction_2009} & Possible overfitting concern \cite{segal_machine_2003}, \cite{philander_identifying_2014}, \cite{luellen_propensity_2005}complicated to interpret because there is no organisational manner by which the single trees disperse inside the forest, i.e. there is no nesting structure whatsoever - since every predictor may appear in different positions, or even trees \cite{strobl_introduction_2009}           \\ \hline
DT        & very computationally efficient, flexible, and also intuitively simple to implement \cite{friedl_decision_1997}robust and insensitive to noise \cite{friedl_decision_1997}simple to interpret and visualise by using simple data analytical techniques \cite{friedl_decision_1997}                                                                                                                                                                                                                                                          & can be readily susceptible to overfitting \cite{gupta_decision_2017}sensitive to variance \cite{gupta_decision_2017}                                                                                                                                                                                                                                                                                                                                     \\ \hline
ERT       & computationally quicker than random forest  with similar performance \cite{geurts_extremely_2006}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                & if the dataset contains a high number of noisy features, which was noted by the authors to have negatively affected the algorithm's overall performance \cite{geurts_extremely_2006}                                                                                                                                                                                                                                                                                        \\ \hline
RGF       & does not require the number of trees to build a hyper-parameter due to automatically calculating it as a result of the loss function minimisation \cite{noauthor_introductory_2018}Excellent prediction accuracy \cite{johnson_learning_2014}                                                                                                                                                                                                                                                                                                                 & slower training time \cite{johnson_learning_2014}                                                                                                                                                                                                                                                                                                                                                                                                                           \\ \hline
SVM       & Based on the concept of determining the best hyperplane that splits the given dataset into two partitions makes it especially fitting for classification problems \cite{noel_bambrick_support_2016}efficiently deal with datasets containing fewer samples \cite{guyon_gene_2002}                                                                                                                                                                                                                                                                            & tend to reduce efficiency significantly with noiser data \cite{noel_bambrick_support_2016}highly computationally expensive, resulting in slow training speeds \cite{noauthor_understanding_2017}Selecting the right kernel hyper-parameter plays a vital role in tuning this model and can also be considered as a setback of this model, as also noted \cite{fradkin_dimacs_nodate}, \cite{burges_tutorial_1998} \\ \hline
LOGREG    & fitting in cases where the predictor is dichotomous (can be split into two clusters, i.e., binary) \cite{statistics_solutions_what_2017}accessible development \cite{rouzier_direct_2009}                                                                                                                                                                                                                                                                                                                                                                    & overfitting - especially when the amount of parameter values increases too much - which in turn makes the algorithm highly inefficient \cite{philander_identifying_2014\                                                                                                                                                                                                                                                                                                    \\ \hline
BAGGING   & equalises the impact of sharp observations which improves performance in the case of weak points \cite{grandvalet_bagging_2004}                                                                                                                                                                                                                                                                                                                                                                                                                                                  & equalises the impact of sharp observations which harms performance in the case of strong points \cite{grandvalet_bagging_2004}                                                                                                                                                                                                                                                                                                                                              \\ \hline
ADABOOST  & performs well and quite fast \cite{freund_short_1999}pretty simple to implement - especially since it requires no tuning parameters to work (only the number of iterations) \cite{freund_short_1999}can be dynamically cohered with every base learning algorithm since it does not require any prior understanding of the weak points \cite{freund_short_1999}                                                                                                                                                                            & initial weak point weighting was slightly better than random, then an exponential drop in the training error was observed \cite{freund_short_1999}                                                                                                                                                                                                                                                                                                                          \\ \hline
XGB       & sparsity-aware operation \cite{analytics_vidhya_which_2017}offers a constructive cache-aware architecture for 'out-of-core' tree generation \cite{analytics_vidhya_which_2017}can also detect non-linear relations in datasets that contain missing values \cite{chen_xgboost:_2016}                                                                                                                                                                                                                                                     & Slower execution speed than LightGBM \cite{noauthor_lightgbm:_2018}                                                                                                                                                                                                                                                                                                                                                                                                         \\ \hline
LGB       & fast and highly accurate performances \cite{analytics_vidhya_which_2017}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        & higher loss function value \cite{wang_lightgbm:_2017}                                                                                                                                                                                                                                                                                                                                                                                                                       \\ \hline
ELM       & simple and efficient \cite{huang_extreme_2006}rapid learning process \cite{huang_extreme_2011}solves straightforwardly \cite{huang_extreme_2006}                                                                                                                                                                                                                                                                                                                                                                                           & No generalisation performance improvement (or slight improvement) \cite{huang_extreme_2006}, \cite{huang_extreme_2011}, \cite{huang_real-time_2006}preventing overfitting would require adaptation as the algorithm learns \cite{huang_extreme_2006}lack of deep-learning functionality (only one level of abstraction)                                                                                            \\ \hline
LDA       & Strong assumptions with equal covariances \cite{yan_comparison_2011}Lower computational cost compared to similar algorithms \cite{fisher_use_1936}, \cite{li_2d-lda:_2005}Mathematically robust \cite{fisher_use_1936}                                                                                                                                                                                                                                                                                                  & Assumptions are sometimes disrupted to produce good results \cite{yan_comparison_2011}. \& Image Classification \cite{li_2d-lda:_2005}LD function sometimes results less then 0 or more than 1 \cite{yan_comparison_2011}                                                                                                                                                                                                             \\ \hline
LR        & Simple to implement/understand \cite{noauthor_learn_2017}Can be used to determine the relationship between features \cite{noauthor_learn_2017}Optimal when relationships are linear.Able to determine the cost of the influence of the variables \cite{noauthor_advantages_nodate}                                                                                                                                                                                                                                                         & Prone to overfitting \cite{noauthor_disadvantages_nodate-1}, \cite{noauthor_learn_2017}Very sensitive to outliers \cite{noauthor_learn_2017}Limited to linear relationships \cite{noauthor_disadvantages_nodate-1}                                                                                                                                                                                                 \\ \hline
TS        & Analytics of confidence intervals \cite{fernandes_parametric_2005}Robust to outliers \cite{fernandes_parametric_2005}Very efficient when error distribution is discontinuous (distinct classes) \cite{peng_consistency_2008}                                                                                                                                                                                                                                                                                                               & Computationally complex \cite{plot.ly_theil-sen_2015}Loses some mathematical properties by working on random subsets \cite{plot.ly_theil-sen_2015}When a heteroscedastic error, biasedness is an issue \cite{wilcox_simulations_1998}                                                                                                                                                                                                 \\ \hline
RIDGE     & Prevents overfitting \cite{noauthor_complete_2016}Performs well (even with highly correlated variables) \cite{noauthor_complete_2016}Co-efficient shrinkage (reduces the model's complexity) \cite{noauthor_complete_2016}                                                                                                                                                                                                                                                                                                                 & Does not remove irrelevant features, but only minimises them \cite{chakon_practical_2017}                                                                                                                                                                                                                                                                                                                                                                                   \\ \hline
NB        & Simple and highly scalable \cite{hand_idiots_2001}Performs well (even with strong dependencies) \cite{zhang_optimality_2004}                                                                                                                                                                                                                                                                                                                                                                                                                                  & Can be biased \cite{hand_idiots_2001}Cannot learn relationships between features (assumes feature independence) \cite{hand_idiots_2001}Low precision and sensitivity with smaller datasets \cite{g._easterling_point_1973}                                                                                                                                                                                                           \\ \hline
SGD       & Can be used as an efficient optimisation algorithm \cite{noauthor_overview_2016}Versatile and simple \cite{bottou_stochastic_2012}Efficient at solving large-scale tasks \cite{zhang_solving_2004}                                                                                                                                                                                                                                                                                                                                         & Slow convergence rate \cite{schoenauer-sebag\_stochastic\_2017\}Tuning the learning rate can be tedious and is very important \cite{vryniotis_tuning_2013}Sensitive to feature scaling \cite{noauthor_disadvantages_nodate}Requires multiple hyper-parameters \cite{noauthor_disadvantages_nodate}                                                                                                                    \\ \hline
\end{tabular}
\end{table}

I have a two column paper, and I want it to be elegant. I am trying to implement this table as a supertabular, however, the table is not fitted correctly into the page layout, and the text is also illegible.

I am using this document layout:

\documentclass[a4paper, 10pt, conference]{ieeeconf}

Any ideas?

EDIT

Document structure:

I have 6 sections (section levels). I want the table to be in section 1. I have some text to be placed before the table in section 1.

Mmm… I suppose you wnat this table to spread over both columns? — Bernard, May 26 '18 at 20:52
Your code contains a lot of simple errors -- e.g., \cite{freund\_short\_1999\} instead of \cite{freund_short_1999} -- that prevent it from being compilable. Please edit your code to fix these errors. Please also tell us which paper size you employ, which document class is in use, and how wide the margins are. — Mico, May 26 '18 at 20:54
Is the name of the document class ieeeconf or IEEEconf? (LaTeX is case-sensitive.) — Mico, May 27 '18 at 10:53

score 3 · Answer 1 · answered May 26 '18 at 22:16

Given your document class and paper size, the table basically fits on two full pages -- if table* environments are used. Not much to be gained, really, from trying to be fancy; just use two separate table* environments. Use full-width tabularx environments inside the table* environments. About the citation call-outs: use a single \cite instruction with 2 (or sometimes 3) arguments.

Do strive, though, to give the table an open and inviting look, mainly by getting rid of all vertical lines, using vertical whitespace instead of \hline in all interior cases, and using the line-drawing macros of the booktabs package (\toprule, \midrule, and \bottomrule) for the few remaining horizontal lines. Your readers will much appreciate this effort -- and will likely reward you by actually reading [yes!] what's in the tables.

The following screenshot shows just the first few cells of the two-page table.

\documentclass[a4paper, 10pt, conference]{IEEEconf}
\usepackage{tabularx,booktabs,ragged2e,caption}
\newcolumntype{L}{>{\RaggedRight\arraybackslash}X}
\captionsetup{skip=0.333\baselineskip}
\begin{document}
\begin{table*}
\caption{My caption}
\label{my-label}
\begin{tabularx}{\textwidth}{@{} lLL @{}}
\toprule
Technique & Possible Advantages & Possible Disadvantagess                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
\\ \midrule
ANN       & Excellent overall calibration error~\cite{tollenaar_which_2013}. High prediction accuracy~\cite{mair_investigation_2000, tollenaar_which_2013, percy_predicting_2016}.                                                                                                                                                                                                                                                                                                                                       
& Neural nets continuously reuse and perform combinations of the input variable through multiple analytical layers, which could make the learning process slow at times~\cite{hardesty_explained:_2017}. Can get very complicated very quickly, making it slightly hard to interpret~\cite{percy_predicting_2016}.
\\ \addlinespace
KMeans    & Clustering provides the functionality to discover and analyse any groups that have formed organically rather than defining the groups before looking at the data~\cite{trevino_introduction_2016}.                                                                                                                                                                                                                                                                                                                                                                                
& Due to its high sensitivity to the starting points of the clustering centres, several runs would be indispensable to procure an optimal solution~\cite{likas_global_2003}.
\\ \addlinespace
KNN       & Simplistic implementation. KNNs are considered to be very flexible and adaptable due to its non-parametric property (no assumptions made on the underlying distribution of the data)~\cite{noauthor_k-nearest_2017}. KNN is also an instance-based, lazy learning algorithm meaning that it does not generalise using the training data~\cite{larose_knearest_2014}.
& This algorithm is more computationally expensive than traditional models (logistic regression and linear regression)~\cite{henley_k-nearest-neighbour_1996}.
\\ \addlinespace
RF        & Efficient execution on large data sets~\cite{breiman_random_2001}. Handling numerous input variables without deletion~\cite{breiman_random_2001}. Balancing the error in class populations~\cite{breiman_random_2001}.  Random forests do not overfit data because of the law of Large Numbers~\cite{breiman_random_2001}. Very good for variable importance (since this algorithm gives every variable the chance to appear in different contexts with different covariates)~\cite{strobl_introduction_2009}. 
& Possible overfitting concern~\cite{segal_machine_2003, philander_identifying_2014, luellen_propensity_2005}. Complicated to interpret because there is no organisational manner by which the single trees disperse inside the forest, i.e. there is no nesting structure whatsoever---since every predictor may appear in different positions, or even trees~\cite{strobl_introduction_2009}.
\\ \addlinespace
DT        & Very computationally efficient, flexible, and also intuitively simple to implement~\cite{friedl_decision_1997}. Robust and insensitive to noise~\cite{friedl_decision_1997}. Simple to interpret and visualise by using simple data analytical techniques~\cite{friedl_decision_1997}.
& Can be readily susceptible to overfitting~\cite{gupta_decision_2017}. Sensitive to variance~\cite{gupta_decision_2017}                                                                                                                                                                                                                                                                                                                                     
\\ \addlinespace
ERT       & Computationally quicker than random forest  with similar performance~\cite{geurts_extremely_2006}.
& If the dataset contains a high number of noisy features, which was noted by the authors to have negatively affected the algorithm's overall performance~\cite{geurts_extremely_2006}.
\\ \addlinespace
RGF       & Does not require the number of trees to build a hyper-parameter due to automatically calculating it as a result of the loss function minimisation~\cite{noauthor_introductory_2018}. Excellent prediction accuracy~\cite{johnson_learning_2014}.
& Slower training time~\cite{johnson_learning_2014}.
\\ \addlinespace
SVM       & Based on the concept of determining the best hyperplane that splits the given dataset into two partitions makes it especially fitting for classification problems~\cite{noel_bambrick_support_2016}. Efficiently deal with datasets containing fewer samples~\cite{guyon_gene_2002}.
& Tend to reduce efficiency significantly with noiser data~\cite{noel_bambrick_support_2016}. Highly computationally expensive, resulting in slow training speeds~\cite{noauthor_understanding_2017}. Selecting the right kernel hyper-parameter plays a vital role in tuning this model and can also be considered as a setback of this model, as also noted~\cite{fradkin_dimacs_nodate, burges_tutorial_1998}.
\\ \addlinespace
LOGREG    & Fitting in cases where the predictor is dichotomous (can be split into two clusters, i.e., binary)~\cite{statistics_solutions_what_2017}. Accessible development~\cite{rouzier_direct_2009}.
& Overfitting---especially when the amount of parameter values increases too much---which in turn makes the algorithm highly inefficient~\cite{philander_identifying_2014}.
\\
\midrule
\multicolumn{3}{r@{}}{\em Cont'd on next page}\\
\end{tabularx}
\end{table*}

\begin{table*}
\ContinuedFloat
\caption{My caption (continued)}
\begin{tabularx}{\textwidth}{@{} lLL @{}}
\toprule
Technique & Possible Advantages & Possible Disadvantages                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
\\ \midrule
BAGGING   & Equalises the impact of sharp observations which improves performance in the case of weak points~\cite{grandvalet_bagging_2004}.
& Equalises the impact of sharp observations which harms performance in the case of strong points~\cite{grandvalet_bagging_2004}.
\\ \addlinespace
ADABOOST  & Performs well and quite fast~\cite{freund_short_1999}. Pretty simple to implement---especially since it requires no tuning parameters to work (only the number of iterations)~\cite{freund_short_1999}. Can be dynamically cohered with every base learning algorithm since it does not require any prior understanding of the weak points~\cite{freund_short_1999}.
& Initial weak point weighting was slightly better than random, then an exponential drop in the training error was observed~\cite{freund_short_1999}.
\\ \addlinespace
XGB       & Sparsity-aware operation~\cite{analytics_vidhya_which_2017}. Offers a constructive cache-aware architecture for `out-of-core' tree generation~\cite{analytics_vidhya_which_2017}. Can also detect non-linear relations in datasets that contain missing values~\cite{chen_xgboost:_2016}.
& Slower execution speed than LightGBM~\cite{noauthor_lightgbm:_2018}.
\\ \addlinespace
LGB       & Fast and highly accurate performances~\cite{analytics_vidhya_which_2017}.
& Higher loss function value~\cite{wang_lightgbm:_2017}.
\\ \addlinespace
ELM       & Simple and efficient~\cite{huang_extreme_2006}. Rapid learning process~\cite{huang_extreme_2011}. Solves straightforwardly~\cite{huang_extreme_2006}.
& No generalisation performance improvement (or slight improvement)~\cite{huang_extreme_2006, huang_extreme_2011, huang_real-time_2006}. Preventing overfitting would require adaptation as the algorithm learns~\cite{huang_extreme_2006}. Lack of deep-learning functionality (only one level of abstraction).
\\ \addlinespace
LDA       & Strong assumptions with equal covariances~\cite{yan_comparison_2011}. Lower computational cost compared to similar algorithms~\cite{fisher_use_1936,li_2d-lda:_2005}. Mathematically robust~\cite{fisher_use_1936}.
& Assumptions are sometimes disrupted to produce good results~\cite{yan_comparison_2011}. \& Image Classification~\cite{li_2d-lda:_2005}. LD function sometimes results less then~0 or more than~1~\cite{yan_comparison_2011}.
\\ \addlinespace
LR        & Simple to implement/understand~\cite{noauthor_learn_2017}. Can be used to determine the relationship between features~\cite{noauthor_learn_2017}. Optimal when relationships are linear. Able to determine the cost of the influence of the variables~\cite{noauthor_advantages_nodate}.
& Prone to overfitting~\cite{noauthor_disadvantages_nodate-1, noauthor_learn_2017}. Very sensitive to outliers~\cite{noauthor_learn_2017}. Limited to linear relationships~\cite{noauthor_disadvantages_nodate-1}.
\\ \addlinespace
TS        & Analytics of confidence intervals~\cite{fernandes_parametric_2005}. Robust to outliers~\cite{fernandes_parametric_2005}. Very efficient when error distribution is discontinuous (distinct classes)~\cite{peng_consistency_2008}.
& Computationally complex~\cite{plot.ly_theil-sen_2015}. Loses some mathematical properties by working on random subsets~\cite{plot.ly_theil-sen_2015}. When a heteroscedastic error, biasedness is an issue~\cite{wilcox_simulations_1998}.
\\ \addlinespace
RIDGE     & Prevents overfitting~\cite{noauthor_complete_2016}. Performs well (even with highly correlated variables)~\cite{noauthor_complete_2016}. Coefficient shrinkage (reduces the model's complexity)~\cite{noauthor_complete_2016}.
& Does not remove irrelevant features, but only minimises them~\cite{chakon_practical_2017}.
\\ \addlinespace
NB        & Simple and highly scalable~\cite{hand_idiots_2001} Performs well (even with strong dependencies)~\cite{zhang_optimality_2004}.
& Can be biased~\cite{hand_idiots_2001}. Cannot learn relationships between features (assumes feature independence)~\cite{hand_idiots_2001}. Low precision and sensitivity with smaller datasets~\cite{g._easterling_point_1973}.
\\ \addlinespace
SGD       & Can be used as an efficient optimisation algorithm~\cite{noauthor_overview_2016}. Versatile and simple~\cite{bottou_stochastic_2012}. Efficient at solving large-scale tasks~\cite{zhang_solving_2004}.                                                                                                                                                                                                                                                                                                                                          
& Slow convergence rate~\cite{schoenauer-sebag_stochastic_2017}. Tuning the learning rate can be tedious and is very important~\cite{vryniotis_tuning_2013}. Sensitive to feature scaling~\cite{noauthor_disadvantages_nodate}. Requires multiple hyper-parameters~\cite{noauthor_disadvantages_nodate}.
\\ \bottomrule
\end{tabularx}
\end{table*}
\end{document}

Thanks for this. It is nice, however, the table is not placed in the section that I want. I want it in section 2, and the table is placed after another section, when a new page begins. — David, May 27 '18 at 10:03
@DavidFarrugia - Have you tried placing the two table* environments earlier (about a page earlier) in the document? More generally, since you haven't provided much information about your document setup (other than that it's in some kind of two-column setup), I'm afraid I can't provide specific advice on how you should proceed. Feel free to share a bit of information about your document, where in the document the table occurs, whether you use chapter-level commands, etc. — Mico, May 27 '18 at 10:28
I will try that. I edited my post to include some further details about the document. If you require any more information, please ask me. — David, May 27 '18 at 10:42
@DavidFarrugia - I could suggest the following: (a) load the afterpage package in the preamble, and (b) enclose the two consecutive table* environments in an \afterpage{\begin{table*}...\end{table*} \begin{table*}...\end{table*}\clearpage} "wrapper". — Mico, May 27 '18 at 10:52
Tried this but the result is pretty ugly. I have empty columns, and still, the tables are displayed in the wrong position. Would thinks its better if I provide the some of tex file - enough to be compilable (excluding some sensitive information) ? — David, May 27 '18 at 11:11
@DavidFarrugia - I can only surmise that the tables you're trying to typeset are not the ones shown in the code above -- I am unable to replicate any of the issues you say you experience. Should I delete my answer as it's evidently of no use to you? — Mico, May 27 '18 at 11:35
I managed to make this solution near- perfect. The only problem left is that the section that follows continues on the right column. Is there a way to tell LaTex to re-start from the left column? And no, I do not think that you should delete your answer. It was well written, and I appreciate the time spent. — David, May 27 '18 at 11:38
@DavidFarrugia - I'm afraid I have no useful insights into what's going on in your document. Sorry. — Mico, May 27 '18 at 12:08

score 1 · Answer 2 · answered May 26 '18 at 21:20

Here is one possibility using the longtable package in combination with horizontal rules from booktabs.

\documentclass[a4paper]{IEEEconf} 
\usepackage{longtable}
\usepackage{booktabs}

\usepackage{ragged2e}
\usepackage{calc}
\usepackage{array}
\newcolumntype{P}{>{\RaggedRight}p{0.5\textwidth-1cm-3\tabcolsep}}

\begin{document}

\onecolumn
\setlength\extrarowheight{5pt}
\begin{longtable}{p{2cm}PP}
\caption{My caption}\\
\toprule
Technique & Possible Advantages & Possible Disadvantages \\
\midrule 
\endfirsthead
\caption{my caption on the following pages}\\
\toprule
Technique & Possible Advantages & Possible Disadvantages \\ \midrule
\endhead
ANN       & Excellent overall calibration error \cite{tollenaar_which_2013}high prediction accuracy \cite{mair_investigation_2000}, \cite{tollenaar_which_2013}, \cite{percy_predicting_2016}                                                                                                                                                                                                                                                                                                                                       & Neural nets continuously reuse and perform combinations of the input variable through multiple analytical layers, which could make the learning process slow at times \cite{hardesty_explained:_2017}can get very complicated very quickly, making it slightly hard to interpret\cite{percy_predicting_2016}                                                                                                                                            
 \\
KMeans    & Clustering provides the functionality to discover and analyse any groups that have formed organically rather than defining the groups before looking at the data \cite{trevino_introduction_2016}                                                                                                                                                                                                                                                                                                                                                                                & due to its high sensitivity to the starting points of the clustering centres, several runs would be indispensable to procure an optimal solution \cite{likas_global_2003}                                                                                                                                                                                                                                                                                                   \\ 
KNN       & simplistic implementation.KNNs are considered to be very flexible and adaptable due to its non-parametric property (no assumptions made on the underlying distribution of the data) \cite{noauthor_k-nearest_2017}KNN is also an instance-based, lazy learning algorithm meaning that it does not generalise using the training data \cite{larose_knearest_2014}                                                                                                                                                                                              & this algorithm is more computationally expensive than traditional models (logistic regression and linear regression) \cite{henley_k-nearest-neighbour_1996}                                                                                                                                                                                                                                                                                                                 \\ 
RF        & efficient execution on large data sets \cite{breiman_random_2001}handling numerous input variables without deletion \cite{breiman_random_2001}balancing the error in class populations \cite{breiman_random_2001}random forests do not overfit data because of the law of Large Numbers \cite{breiman_random_2001}Very good for variable importance (since this algorithm gives every variable the chance to appear in different contexts with different covariates) \cite{strobl_introduction_2009} & Possible overfitting concern \cite{segal_machine_2003}, \cite{philander_identifying_2014}, \cite{luellen_propensity_2005}complicated to interpret because there is no organisational manner by which the single trees disperse inside the forest, i.e. there is no nesting structure whatsoever - since every predictor may appear in different positions, or even trees \cite{strobl_introduction_2009}           \\ 
DT        & very computationally efficient, flexible, and also intuitively simple to implement \cite{friedl_decision_1997}robust and insensitive to noise \cite{friedl_decision_1997}simple to interpret and visualise by using simple data analytical techniques \cite{friedl_decision_1997}                                                                                                                                                                                                                                                          & can be readily susceptible to overfitting \cite{gupta_decision_2017}sensitive to variance \cite{gupta_decision_2017}                                                                                                                                                                                                                                                                                                                                     \\ 
ERT       & computationally quicker than random forest  with similar performance \cite{geurts_extremely_2006}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                & if the dataset contains a high number of noisy features, which was noted by the authors to have negatively affected the algorithm's overall performance \cite{geurts_extremely_2006}                                                                                                                                                                                                                                                                                        \\ 
RGF       & does not require the number of trees to build a hyper-parameter due to automatically calculating it as a result of the loss function minimisation \cite{noauthor_introductory_2018}Excellent prediction accuracy \cite{johnson_learning_2014}                                                                                                                                                                                                                                                                                                                 & slower training time \cite{johnson_learning_2014}                                                                                                                                                                                                                                                                                                                                                                                                                           \\ 
SVM       & Based on the concept of determining the best hyperplane that splits the given dataset into two partitions makes it especially fitting for classification problems \cite{noel_bambrick_support_2016}efficiently deal with datasets containing fewer samples \cite{guyon_gene_2002}                                                                                                                                                                                                                                                                            & tend to reduce efficiency significantly with noiser data \cite{noel_bambrick_support_2016}highly computationally expensive, resulting in slow training speeds \cite{noauthor_understanding_2017}Selecting the right kernel hyper-parameter plays a vital role in tuning this model and can also be considered as a setback of this model, as also noted \cite{fradkin_dimacs_nodate}, \cite{burges_tutorial_1998} \\ 
LOGREG    & fitting in cases where the predictor is dichotomous (can be split into two clusters, i.e., binary) \cite{statistics_solutions_what_2017}accessible development \cite{rouzier_direct_2009}                                                                                                                                                                                                                                                                                                                                                                    & overfitting - especially when the amount of parameter values increases too much - which in turn makes the algorithm highly inefficient \cite{philander_identifying_2014}                                                                                                                                                                                                                                                                                                    \\ 
BAGGING   & equalises the impact of sharp observations which improves performance in the case of weak points \cite{grandvalet_bagging_2004}                                                                                                                                                                                                                                                                                                                                                                                                                                                  & equalises the impact of sharp observations which harms performance in the case of strong points \cite{grandvalet_bagging_2004}                                                                                                                                                                                                                                                                                                                                              \\ 
ADABOOST  & performs well and quite fast \cite{freund_short_1999}pretty simple to implement - especially since it requires no tuning parameters to work (only the number of iterations) \cite{freund_short_1999}can be dynamically cohered with every base learning algorithm since it does not require any prior understanding of the weak points \cite{freund_short_1999}                                                                                                                                                                            & initial weak point weighting was slightly better than random, then an exponential drop in the training error was observed \cite{freund_short_1999}                                                                                                                                                                                                                                                                                                                          \\ 
XGB       & sparsity-aware operation \cite{analytics_vidhya_which_2017}offers a constructive cache-aware architecture for 'out-of-core' tree generation \cite{analytics_vidhya_which_2017}can also detect non-linear relations in datasets that contain missing values \cite{chen_xgboost:_2016}                                                                                                                                                                                                                                                     & Slower execution speed than LightGBM \cite{noauthor_lightgbm:_2018}                                                                                                                                                                                                                                                                                                                                                                                                         \\ 
LGB       & fast and highly accurate performances \cite{analytics_vidhya_which_2017}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        & higher loss function value \cite{wang_lightgbm:_2017}                                                                                                                                                                                                                                                                                                                                                                                                                       \\ 
ELM       & simple and efficient \cite{huang_extreme_2006}rapid learning process \cite{huang_extreme_2011}solves straightforwardly \cite{huang_extreme_2006}                                                                                                                                                                                                                                                                                                                                                                                           & No generalisation performance improvement (or slight improvement) \cite{huang_extreme_2006}, \cite{huang_extreme_2011}, \cite{huang_real-time_2006}preventing overfitting would require adaptation as the algorithm learns \cite{huang_extreme_2006}lack of deep-learning functionality (only one level of abstraction)                                                                                            \\ 
LDA       & Strong assumptions with equal covariances \cite{yan_comparison_2011}Lower computational cost compared to similar algorithms \cite{fisher_use_1936}, \cite{li_2d-lda:_2005}Mathematically robust \cite{fisher_use_1936}                                                                                                                                                                                                                                                                                                  & Assumptions are sometimes disrupted to produce good results \cite{yan_comparison_2011}. \& Image Classification \cite{li_2d-lda:_2005}LD function sometimes results less then 0 or more than 1 \cite{yan_comparison_2011}                                                                                                                                                                                                             \\ 
LR        & Simple to implement/understand \cite{noauthor_learn_2017}Can be used to determine the relationship between features \cite{noauthor_learn_2017}Optimal when relationships are linear.Able to determine the cost of the influence of the variables \cite{noauthor_advantages_nodate}                                                                                                                                                                                                                                                         & Prone to overfitting \cite{noauthor_disadvantages_nodate-1}, \cite{noauthor_learn_2017}Very sensitive to outliers \cite{noauthor_learn_2017}Limited to linear relationships \cite{noauthor_disadvantages_nodate-1}                                                                                                                                                                                                 \\ 
TS        & Analytics of confidence intervals \cite{fernandes_parametric_2005}Robust to outliers \cite{fernandes_parametric_2005}Very efficient when error distribution is discontinuous (distinct classes) \cite{peng_consistency_2008}                                                                                                                                                                                                                                                                                                               & Computationally complex \cite{plot.ly_theil-sen_2015}Loses some mathematical properties by working on random subsets \cite{plot.ly_theil-sen_2015}When a heteroscedastic error, biasedness is an issue \cite{wilcox_simulations_1998}                                                                                                                                                                                                 \\ 
RIDGE     & Prevents overfitting \cite{noauthor_complete_2016}Performs well (even with highly correlated variables) \cite{noauthor_complete_2016}Co-efficient shrinkage (reduces the model's complexity) \cite{noauthor_complete_2016}                                                                                                                                                                                                                                                                                                                 & Does not remove irrelevant features, but only minimises them \cite{chakon_practical_2017}                                                                                                                                                                                                                                                                                                                                                                                   \\ 
NB        & Simple and highly scalable \cite{hand_idiots_2001}Performs well (even with strong dependencies) \cite{zhang_optimality_2004}                                                                                                                                                                                                                                                                                                                                                                                                                                  & Can be biased \cite{hand_idiots_2001}Cannot learn relationships between features (assumes feature independence) \cite{hand_idiots_2001}Low precision and sensitivity with smaller datasets \cite{g._easterling_point_1973}                                                                                                                                                                                                           \\ 
SGD       & Can be used as an efficient optimisation algorithm \cite{noauthor_overview_2016}Versatile and simple \cite{bottou_stochastic_2012}Efficient at solving large-scale tasks \cite{zhang_solving_2004}                                                                                                                                                                                                                                                                                                                                         & Slow convergence rate \cite{schoenauer-sebag_stochastic_2017}Tuning the learning rate can be tedious and is very important \cite{vryniotis_tuning_2013}Sensitive to feature scaling \cite{noauthor_disadvantages_nodate}Requires multiple hyper-parameters \cite{noauthor_disadvantages_nodate}                                                                                                                    \\ \bottomrule
\end{longtable}
\twocolumn

\end{document}

This works fine. However, a full page is wasted before the table — David, May 27 '18 at 10:46

score 1 · Answer 3 · answered May 26 '18 at 22:05

another possibility is manual split table into two part and than use the macro \ContinuedFloat from the package caption. with use of the package stfloats the first part of table can be positioned at bottom of page, the second part of table on the top of the next page. for the second part of table is sensible that occupy whole page:

    \documentclass[a4paper, 10pt, conference]{ieeeconf}
    \usepackage{booktabs, tabularx}
    \usepackage{stfloats}
    \usepackage{caption}

    \usepackage{lipsum} % for dummy text

    \begin{document}
    \lipsum[11]
    %% first part of table
        \begin{table*}[b]
    \centering
    \caption{My caption}
    \small
    \label{my-label}
    \begin{tabularx}{\linewidth}{@{} l X X @{}}
        \toprule
    Technique & Possible Advantages & Possible Disadvantages    \\
        \midrule
    ANN
        & Excellent overall calibration error \cite{tollenaar_which_2013}high prediction accuracy \cite{mair_investigation_2000}, \cite{tollenaar_which_2013}, \cite{percy_predicting_2016}
            & Neural nets continuously reuse and perform combinations of the input variable through multiple analytical layers, which could make the learning process slow at times \cite{hardesty_explained:_2017} can get very complicated very quickly, making it slightly hard to interpret\cite{percy_predicting_2016}
            \\  \addlinespace
    KMeans
        & Clustering provides the functionality to discover and analyse any groups that have formed organically rather than defining the groups before looking at the data \cite{trevino_introduction_2016}
            & due to its high sensitivity to the starting points of the clustering centres, several runs would be indispensable to procure an optimal solution \cite{likas_global_2003}
        \\ \addlinespace
    KNN
        & simplistic implementation.KNNs are considered to be very flexible and adaptable due to its non-parametric property (no assumptions made on the underlying distribution of the data) \cite{noauthor_k-nearest_2017. }KNN is also an instance-based, lazy learning algorithm meaning that it does not generalise using the training data \cite{larose_knearest_2014}
            & this algorithm is more computationally expensive than traditional models (logistic regression and linear regression) \cite{henley_k-nearest-neighbour_1996}
        \\ \addlinespace
    RF
        & efficient execution on large data sets \cite{breiman_random_2001} handling numerous input variables without deletion \cite{breiman_random_2001} balancing the error in class populations \cite{breiman_random_2001} random forests do not overfit data because of the law of Large Numbers \cite{breiman_random_2001}. Very good for variable importance (since this algorithm gives every variable the chance to appear in different contexts with different covariates) \cite{strobl_introduction_2009}
            & Possible overfitting concern \cite{segal_machine_2003}, \cite{philander_identifying_2014}, \cite{luellen_propensity_2005} complicated to interpret because there is no organisational manner by which the single trees disperse inside the forest, i.e. there is no nesting structure whatsoever - since every predictor may appear in different positions, or even trees \cite{strobl_introduction_2009}
            \\  \addlinespace
    DT
        & very computationally efficient, flexible, and also intuitively simple to implement \cite{friedl_decision_1997}robust and insensitive to noise \cite{friedl_decision_1997} simple to interpret and visualise by using simple data analytical techniques \cite{friedl_decision_1997}
            & can be readily susceptible to overfitting \cite{gupta_decision_2017} sensitive to variance \cite{gupta_decision_2017}
            \\  \addlinespace
    ERT
        & computationally quicker than random forest  with similar performance \cite{geurts_extremely_2006}
            & if the dataset contains a high number of noisy features, which was noted by the authors to have negatively affected the algorithm's overall performance \cite{geurts_extremely_2006}
        \\  \addlinespace
    RGF
        & does not require the number of trees to build a hyper-parameter due to automatically calculating it as a result of the loss function minimisation \cite{noauthor_introductory_2018}. Excellent prediction accuracy \cite{johnson_learning_2014}
            & slower training time \cite{johnson_learning_2014}
        \\    \bottomrule
    \end{tabularx}
        \end{table*}
    %% second part of table
        \begin{table*}[tp]
    \ContinuedFloat
    \centering
    \caption{My caption}
    \small
    \label{my-label}
    \begin{tabularx}{\linewidth}{@{} l X X @{}}
        \toprule
    Technique & Possible Advantages & Possible Disadvantages    \\
        \midrule
    SVM
        & Based on the concept of determining the best hyperplane that splits the given dataset into two partitions makes it especially fitting for classification problems \cite{noel_bambrick_support_2016}efficiently deal with datasets containing fewer samples \cite{guyon_gene_2002}
            & tend to reduce efficiency significantly with noiser data \cite{noel_bambrick_support_2016} highly computationally expensive, resulting in slow training speeds \cite{noauthor_understanding_2017}. Selecting the right kernel hyper-parameter plays a vital role in tuning this model and can also be considered as a setback of this model, as also noted \cite{fradkin_dimacs_nodate}, \cite{burges_tutorial_1998}
        \\  \addlinespace
    LOGREG
        & fitting in cases where the predictor is dichotomous (can be split into two clusters, i.e., binary) \cite{statistics_solutions_what_2017} accessible development \cite{rouzier_direct_2009}
            & overfitting - especially when the amount of parameter values increases too much - which in turn makes the algorithm highly inefficient \cite{philander_identifying_2014}
        \\  \addlinespace
    BAGGING
        & equalises the impact of sharp observations which improves performance in the case of weak points \cite{grandvalet_bagging_2004}
            & equalises the impact of sharp observations which harms performance in the case of strong points \cite{grandvalet_bagging_2004}
        \\  \addlinespace
    ADABOOST  & performs well and quite fast \cite{freund_short_1999}pretty simple to implement - especially since it requires no tuning parameters to work (only the number of iterations) \cite{freund_short_1999}can be dynamically cohered with every base learning algorithm since it does not require any prior understanding of the weak points \cite{freund_short_1999}                                                                                                                                                                            & initial weak point weighting was slightly better than random, then an exponential drop in the training error was observed \cite{freund_short_1999}                                                                                                                                                                                                                                                                                                                          \\ \addlinespace
    XGB       & sparsity-aware operation \cite{analytics_vidhya_which_2017}offers a constructive cache-aware architecture for 'out-of-core' tree generation \cite{analytics_vidhya_which_2017}can also detect non-linear relations in datasets that contain missing values \cite{chen_xgboost:_2016}                                                                                                                                                                                                                                                     & Slower execution speed than LightGBM \cite{noauthor_lightgbm:_2018}                                                                                                                                                                                                                                                                                                                                                                                                         \\ \addlinespace
    LGB       & fast and highly accurate performances \cite{analytics_vidhya_which_2017}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        & higher loss function value \cite{wang_lightgbm:_2017}                                                                                                                                                                                                                                                                                                                                                                                                                       \\ \addlinespace
    ELM       & simple and efficient \cite{huang_extreme_2006}rapid learning process \cite{huang_extreme_2011}solves straightforwardly \cite{huang_extreme_2006}                                                                                                                                                                                                                                                                                                                                                                                           & No generalisation performance improvement (or slight improvement) \cite{huang_extreme_2006}, \cite{huang_extreme_2011}, \cite{huang_real-time_2006}preventing overfitting would require adaptation as the algorithm learns \cite{huang_extreme_2006}lack of deep-learning functionality (only one level of abstraction)                                                                                            \\ \addlinespace
    LDA       & Strong assumptions with equal covariances \cite{yan_comparison_2011}Lower computational cost compared to similar algorithms \cite{fisher_use_1936}, \cite{li_2d-lda:_2005}Mathematically robust \cite{fisher_use_1936}                                                                                                                                                                                                                                                                                                  & Assumptions are sometimes disrupted to produce good results \cite{yan_comparison_2011}. \& Image Classification \cite{li_2d-lda:_2005}LD function sometimes results less then 0 or more than 1 \cite{yan_comparison_2011}                                                                                                                                                                                                             \\ \addlinespace
    LR        & Simple to implement/understand \cite{noauthor_learn_2017}Can be used to determine the relationship between features \cite{noauthor_learn_2017}Optimal when relationships are linear.Able to determine the cost of the influence of the variables \cite{noauthor_advantages_nodate}                                                                                                                                                                                                                                                         & Prone to overfitting \cite{noauthor_disadvantages_nodate-1}, \cite{noauthor_learn_2017}Very sensitive to outliers \cite{noauthor_learn_2017}Limited to linear relationships \cite{noauthor_disadvantages_nodate-1}                                                                                                                                                                                                 \\ \addlinespace
    TS        & Analytics of confidence intervals \cite{fernandes_parametric_2005}Robust to outliers \cite{fernandes_parametric_2005}Very efficient when error distribution is discontinuous (distinct classes) \cite{peng_consistency_2008}                                                                                                                                                                                                                                                                                                               & Computationally complex \cite{plot.ly_theil-sen_2015}Loses some mathematical properties by working on random subsets \cite{plot.ly_theil-sen_2015}When a heteroscedastic error, biasedness is an issue \cite{wilcox_simulations_1998}                                                                                                                                                                                                 \\ \addlinespace
    RIDGE     & Prevents overfitting \cite{noauthor_complete_2016}Performs well (even with highly correlated variables) \cite{noauthor_complete_2016}. Co-efficient shrinkage (reduces the model's complexity) \cite{noauthor_complete_2016}                                                                                                                                                                                                                                                                                                                 & Does not remove irrelevant features, but only minimises them \cite{chakon_practical_2017}                                                                                                                                                                                                                                                                                                                                                                                   \\ \addlinespace
    NB        & Simple and highly scalable \cite{hand_idiots_2001}Performs well (even with strong dependencies) \cite{zhang_optimality_2004}                                                                                                                                                                                                                                                                                                                                                                                                                                  & Can be biased \cite{hand_idiots_2001}Cannot learn relationships between features (assumes feature independence) \cite{hand_idiots_2001}Low precision and sensitivity with smaller datasets \cite{g._easterling_point_1973}                                                                                                                                                                                                           \\ \addlinespace        SGD       & Can be used as an efficient optimisation algorithm \cite{noauthor_overview_2016}Versatile and simple \cite{bottou_stochastic_2012}Efficient at solving large-scale tasks \cite{zhang_solving_2004}                                                                                                                                                                                                                                                                                                                                         & Slow convergence rate \cite{schoenauer-sebag_stochastic_2017}. Tuning the learning rate can be tedious and is very important \cite{vryniotis_tuning_2013}. Sensitive to feature scaling \cite{noauthor_disadvantages_nodate}. Requires multiple hyper-parameters \cite{noauthor_disadvantages_nodate}                                                                                                                    \\ \bottomrule
\end{tabularx}
    \end{table*}
\lipsum
\end{document}

score 1 · Accepted Answer · answered May 26 '18 at 23:24

Here is a solution based on xtab, which defines an xtabular environment, and an xtabular* version for two column mode. However I couldn't make it work as expected, so I resorted to a work-around: inserting axtabular in a strip environment (from the cuted package), which switches temporarily to one-column mode:

\documentclass[a4paper, 10pt, conference]{ieeeconf}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{array, xtab, caption}
\usepackage{cuted}
\usepackage{lipsum}
\usepackage{bigstrut}

\begin{document}
Some text some text some text some text some text some text some text some text some text some text some text some text some text some text some text some text some text. Some text some text some text some text some text some text some text some text some text some text some text some text some text some text some text some text some text.

Some more text some more text some more text some more text some more text some more text some more text some more text some more text. Some more text some more text some more text some more text some more text some more text some more text some more text.
 \begin{strip}
\centering
\setlength{\extrarowheight}{2pt}
\tablecaption{My caption}
\label{my-label}
\tablehead{\hline
Technique & Possible Advantages & Possible Disadvantages \\ \hline}
\tabletail{\hline \multicolumn{3}{r}{\bigstrut[t] \em To be continued}\\}
\tablelasttail{\hline }
\begin{xtabular}{|l|p{0.4\textwidth}|p{0.4\textwidth}|}
ANN & Excellent overall calibration error \cite{tollenaar_which_2013}high prediction accuracy \cite{mair_investigation_2000}, \cite{tollenaar_which_2013}, \cite{percy_predicting_2016} & Neural nets continuously reuse and perform combinations of the input variable through multiple analytical layers, which could make the learning process slow at times \cite{hardesty_explained:_2017}can get very complicated very quickly, making it slightly hard to interpret\cite{percy_predicting_2016} \\
\shrinkheight{-20ex} \hline
KMeans & Clustering provides the functionality to discover and analyse any groups that have formed organically rather than defining the groups before looking at the data \cite{trevino_introduction_2016} & due to its high sensitivity to the starting points of the clustering centres, several runs would be indispensable to procure an optimal solution \cite{likas_global_2003} \\ \hline
KNN & simplistic implementation.KNNs are considered to be very flexible and adaptable due to its non-parametric property (no assumptions made on the underlying distribution of the data) \cite{noauthor_k-nearest_2017}KNN is also an instance-based, lazy learning algorithm meaning that it does not generalise using the training data \cite{larose_knearest_2014} & this algorithm is more computationally expensive than traditional models (logistic regression and linear regression) \cite{henley_k-nearest-neighbour_1996} \\ \hline
RF & efficient execution on large data sets \cite{breiman_random_2001}handling numerous input variables without deletion \cite{breiman_random_2001}balancing the error in class populations \cite{breiman_random_2001}random forests do not overfit data because of the law of Large Numbers \cite{breiman_random_2001}Very good for variable importance (since this algorithm gives every variable the chance to appear in different contexts with different covariates) \cite{strobl_introduction_2009} & Possible overfitting concern \cite{segal_machine_2003}, \cite{philander_identifying_2014}, \cite{luellen_propensity_2005}complicated to interpret because there is no organisational manner by which the single trees disperse inside the forest, i.e. there is no nesting structure whatsoever - since every predictor may appear in different positions, or even trees \cite{strobl_introduction_2009} \\ \hline
DT & very computationally efficient, flexible, and also intuitively simple to implement \cite{friedl_decision_1997}robust and insensitive to noise \cite{friedl_decision_1997}simple to interpret and visualise by using simple data analytical techniques \cite{friedl_decision_1997} & can be readily susceptible to overfitting \cite{gupta_decision_2017}sensitive to variance \cite{gupta_decision_2017} \\ \hline
ERT & computationally quicker than random forest with similar performance \cite{geurts_extremely_2006} & if the dataset contains a high number of noisy features, which was noted by the authors to have negatively affected the algorithm's overall performance \cite{geurts_extremely_2006} \\ \hline
RGF & does not require the number of trees to build a hyper-parameter due to automatically calculating it as a result of the loss function minimisation \cite{noauthor_introductory_2018}Excellent prediction accuracy \cite{johnson_learning_2014} & slower training time \cite{johnson_learning_2014} \\ \hline
SVM & Based on the concept of determining the best hyperplane that splits the given dataset into two partitions makes it especially fitting for classification problems \cite{noel_bambrick_support_2016}efficiently deal with datasets containing fewer samples \cite{guyon_gene_2002} & tend to reduce efficiency significantly with noiser data \cite{noel_bambrick_support_2016}highly computationally expensive, resulting in slow training speeds \cite{noauthor_understanding_2017}Selecting the right kernel hyper-parameter plays a vital role in tuning this model and can also be considered as a setback of this model, as also noted \cite{fradkin_dimacs_nodate}, \cite{burges_tutorial_1998} \\ \hline
LOGREG & fitting in cases where the predictor is dichotomous (can be split into two clusters, i.e., binary) \cite{statistics_solutions_what_2017}accessible development \cite{rouzier_direct_2009} & overfitting - especially when the amount of parameter values increases too much - which in turn makes the algorithm highly inefficient \cite{philander_identifying_2014} \\ \hline
BAGGING & equalises the impact of sharp observations which improves performance in the case of weak points \cite{grandvalet_bagging_2004} & equalises the impact of sharp observations which harms performance in the case of strong points \cite{grandvalet_bagging_2004} \\ \hline
ADABOOST & performs well and quite fast \cite{freund_short_1999}pretty simple to implement - especially since it requires no tuning parameters to work (only the number of iterations) \cite{freund_short_1999}can be dynamically cohered with every base learning algorithm since it does not require any prior understanding of the weak points \cite{freund_short_1999} & initial weak point weighting was slightly better than random, then an exponential drop in the training error was observed \cite{freund_short_1999} \\ \hline
XGB & sparsity-aware operation \cite{analytics_vidhya_which_2017}offers a constructive cache-aware architecture for 'out-of-core' tree generation \cite{analytics_vidhya_which_2017}can also detect non-linear relations in datasets that contain missing values \cite{chen_xgboost:_2016} & Slower execution speed than LightGBM \cite{noauthor_lightgbm:_2018} \\ \hline
LGB & fast and highly accurate performances \cite{analytics_vidhya_which_2017} & higher loss function value \cite{wang_lightgbm:_2017} \\ \shrinkheight{-20ex}\hline
ELM & simple and efficient \cite{huang_extreme_2006}rapid learning process \cite{huang_extreme_2011}solves straightforwardly \cite{huang_extreme_2006} & No generalisation performance improvement (or slight improvement) \cite{huang_extreme_2006}, \cite{huang_extreme_2011}, \cite{huang_real-time_2006}preventing overfitting would require adaptation as the algorithm learns \cite{huang_extreme_2006}lack of deep-learning functionality (only one level of abstraction) \\ \hline
LDA & Strong assumptions with equal covariances \cite{yan_comparison_2011}Lower computational cost compared to similar algorithms \cite{fisher_use_1936}, \cite{li_2d-lda:_2005}Mathematically robust \cite{fisher_use_1936} & Assumptions are sometimes disrupted to produce good results \cite{yan_comparison_2011}. \& Image Classification \cite{li_2d-lda:_2005}LD function sometimes results less then 0 or more than 1 \cite{yan_comparison_2011} \\ \hline
LR & Simple to implement/understand \cite{noauthor_learn_2017}Can be used to determine the relationship between features \cite{noauthor_learn_2017}Optimal when relationships are linear.Able to determine the cost of the influence of the variables \cite{noauthor_advantages_nodate} & Prone to overfitting \cite{noauthor_disadvantages_nodate-1}, \cite{noauthor_learn_2017}Very sensitive to outliers \cite{noauthor_learn_2017}Limited to linear relationships \cite{noauthor_disadvantages_nodate-1} \\ \hline
TS & Analytics of confidence intervals \cite{fernandes_parametric_2005}Robust to outliers \cite{fernandes_parametric_2005}Very efficient when error distribution is discontinuous (distinct classes) \cite{peng_consistency_2008} & Computationally complex \cite{plot.ly_theil-sen_2015}Loses some mathematical properties by working on random subsets \cite{plot.ly_theil-sen_2015}When a heteroscedastic error, biasedness is an issue \cite{wilcox_simulations_1998} \\ \hline
RIDGE & Prevents overfitting \cite{noauthor_complete_2016}Performs well (even with highly correlated variables) \cite{noauthor_complete_2016}Co-efficient shrinkage (reduces the model's complexity) \cite{noauthor_complete_2016} & Does not remove irrelevant features, but only minimises them \cite{chakon_practical_2017} \\ \hline
NB & Simple and highly scalable \cite{hand_idiots_2001}Performs well (even with strong dependencies) \cite{zhang_optimality_2004} & Can be biased \cite{hand_idiots_2001}Cannot learn relationships between features (assumes feature independence) \cite{hand_idiots_2001}Low precision and sensitivity with smaller datasets \cite{g._easterling_point_1973} \\ \hline
SGD & Can be used as an efficient optimisation algorithm \cite{noauthor_overview_2016}Versatile and simple \cite{bottou_stochastic_2012}Efficient at solving large-scale tasks \cite{zhang_solving_2004} & Slow convergence rate \cite{schoenauer-sebag_stochastic_2017}Tuning the learning rate can be tedious and is very important \cite{vryniotis_tuning_2013}Sensitive to feature scaling \cite{noauthor_disadvantages_nodate}Requires multiple hyper-parameters \cite{noauthor_disadvantages_nodate} \\
\end{xtabular}
\end{strip}
    \lipsum[3-20]

    \end{document}

*

Last page of the table (out of 3):

This works great. However, the next section that follows has an odd layout. On the next page, only the right column is filled with the text (left column is empty). Everything returns to normal the page after that. — David, May 27 '18 at 10:30
I know supertabular and xtab are not perfect. xtabprovides some tools to fine-tune the placement of layout (one of these I used in my code is \shrinkheight, which does something like \enlargethispage for ordinary text, when the page is broken too soon or too late). You'll find their list in the documentation. Also, as I used cuted, tha table no more floats, so you may change the point at which you insert it to obtain optimal results. In any case, if you can't get a satisfactory result, you should post a new question with the problematic code. — Bernard, May 27 '18 at 11:16
I managed to make this work. However, a slight issue is that the first split is being made at halfway of the page; thus, leaving half the page empty. The other splits work fine. Do you know how to fix this, please? — David, May 27 '18 at 13:19
This is what I had to do in my code: use \shrinkheight{some negative value} after the first \\. The value is to be found by trial and error. — Bernard, May 27 '18 at 13:34

Transforming a table to a supertabular

4 Answers4

Linked