\documentclass[a4paper, 10pt, conference]{IEEEconf}
\caption{My caption}
\begin{tabularx}{\textwidth}{@{} lLL @{}}
Technique & Possible Advantages & Possible Disadvantagess                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
\\ \midrule
ANN       & Excellent overall calibration error~\cite{tollenaar_which_2013}. High prediction accuracy~\cite{mair_investigation_2000, tollenaar_which_2013, percy_predicting_2016}.                                                                                                                                                                                                                                                                                                                                       
& Neural nets continuously reuse and perform combinations of the input variable through multiple analytical layers, which could make the learning process slow at times~\cite{hardesty_explained:_2017}. Can get very complicated very quickly, making it slightly hard to interpret~\cite{percy_predicting_2016}.
\\ \addlinespace
KMeans    & Clustering provides the functionality to discover and analyse any groups that have formed organically rather than defining the groups before looking at the data~\cite{trevino_introduction_2016}.                                                                                                                                                                                                                                                                                                                                                                                
& Due to its high sensitivity to the starting points of the clustering centres, several runs would be indispensable to procure an optimal solution~\cite{likas_global_2003}.
\\ \addlinespace
KNN       & Simplistic implementation. KNNs are considered to be very flexible and adaptable due to its non-parametric property (no assumptions made on the underlying distribution of the data)~\cite{noauthor_k-nearest_2017}. KNN is also an instance-based, lazy learning algorithm meaning that it does not generalise using the training data~\cite{larose_knearest_2014}.
& This algorithm is more computationally expensive than traditional models (logistic regression and linear regression)~\cite{henley_k-nearest-neighbour_1996}.
\\ \addlinespace
RF        & Efficient execution on large data sets~\cite{breiman_random_2001}. Handling numerous input variables without deletion~\cite{breiman_random_2001}. Balancing the error in class populations~\cite{breiman_random_2001}.  Random forests do not overfit data because of the law of Large Numbers~\cite{breiman_random_2001}. Very good for variable importance (since this algorithm gives every variable the chance to appear in different contexts with different covariates)~\cite{strobl_introduction_2009}. 
& Possible overfitting concern~\cite{segal_machine_2003, philander_identifying_2014, luellen_propensity_2005}. Complicated to interpret because there is no organisational manner by which the single trees disperse inside the forest, i.e. there is no nesting structure whatsoever---since every predictor may appear in different positions, or even trees~\cite{strobl_introduction_2009}.
\\ \addlinespace
DT        & Very computationally efficient, flexible, and also intuitively simple to implement~\cite{friedl_decision_1997}. Robust and insensitive to noise~\cite{friedl_decision_1997}. Simple to interpret and visualise by using simple data analytical techniques~\cite{friedl_decision_1997}.
& Can be readily susceptible to overfitting~\cite{gupta_decision_2017}. Sensitive to variance~\cite{gupta_decision_2017}                                                                                                                                                                                                                                                                                                                                     
\\ \addlinespace
ERT       & Computationally quicker than random forest  with similar performance~\cite{geurts_extremely_2006}.
& If the dataset contains a high number of noisy features, which was noted by the authors to have negatively affected the algorithm's overall performance~\cite{geurts_extremely_2006}.
\\ \addlinespace
RGF       & Does not require the number of trees to build a hyper-parameter due to automatically calculating it as a result of the loss function minimisation~\cite{noauthor_introductory_2018}. Excellent prediction accuracy~\cite{johnson_learning_2014}.
& Slower training time~\cite{johnson_learning_2014}.
\\ \addlinespace
SVM       & Based on the concept of determining the best hyperplane that splits the given dataset into two partitions makes it especially fitting for classification problems~\cite{noel_bambrick_support_2016}. Efficiently deal with datasets containing fewer samples~\cite{guyon_gene_2002}.
& Tend to reduce efficiency significantly with noiser data~\cite{noel_bambrick_support_2016}. Highly computationally expensive, resulting in slow training speeds~\cite{noauthor_understanding_2017}. Selecting the right kernel hyper-parameter plays a vital role in tuning this model and can also be considered as a setback of this model, as also noted~\cite{fradkin_dimacs_nodate, burges_tutorial_1998}.
\\ \addlinespace
LOGREG    & Fitting in cases where the predictor is dichotomous (can be split into two clusters, i.e., binary)~\cite{statistics_solutions_what_2017}. Accessible development~\cite{rouzier_direct_2009}.
& Overfitting---especially when the amount of parameter values increases too much---which in turn makes the algorithm highly inefficient~\cite{philander_identifying_2014}.
\multicolumn{3}{r@{}}{\em Cont'd on next page}\\

\caption{My caption (continued)}
\begin{tabularx}{\textwidth}{@{} lLL @{}}
Technique & Possible Advantages & Possible Disadvantages                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
\\ \midrule
BAGGING   & Equalises the impact of sharp observations which improves performance in the case of weak points~\cite{grandvalet_bagging_2004}.
& Equalises the impact of sharp observations which harms performance in the case of strong points~\cite{grandvalet_bagging_2004}.
\\ \addlinespace
ADABOOST  & Performs well and quite fast~\cite{freund_short_1999}. Pretty simple to implement---especially since it requires no tuning parameters to work (only the number of iterations)~\cite{freund_short_1999}. Can be dynamically cohered with every base learning algorithm since it does not require any prior understanding of the weak points~\cite{freund_short_1999}.
& Initial weak point weighting was slightly better than random, then an exponential drop in the training error was observed~\cite{freund_short_1999}.
\\ \addlinespace
XGB       & Sparsity-aware operation~\cite{analytics_vidhya_which_2017}. Offers a constructive cache-aware architecture for `out-of-core' tree generation~\cite{analytics_vidhya_which_2017}. Can also detect non-linear relations in datasets that contain missing values~\cite{chen_xgboost:_2016}.
& Slower execution speed than LightGBM~\cite{noauthor_lightgbm:_2018}.
\\ \addlinespace
LGB       & Fast and highly accurate performances~\cite{analytics_vidhya_which_2017}.
& Higher loss function value~\cite{wang_lightgbm:_2017}.
\\ \addlinespace
ELM       & Simple and efficient~\cite{huang_extreme_2006}. Rapid learning process~\cite{huang_extreme_2011}. Solves straightforwardly~\cite{huang_extreme_2006}.
& No generalisation performance improvement (or slight improvement)~\cite{huang_extreme_2006, huang_extreme_2011, huang_real-time_2006}. Preventing overfitting would require adaptation as the algorithm learns~\cite{huang_extreme_2006}. Lack of deep-learning functionality (only one level of abstraction).
\\ \addlinespace
LDA       & Strong assumptions with equal covariances~\cite{yan_comparison_2011}. Lower computational cost compared to similar algorithms~\cite{fisher_use_1936,li_2d-lda:_2005}. Mathematically robust~\cite{fisher_use_1936}.
& Assumptions are sometimes disrupted to produce good results~\cite{yan_comparison_2011}. \& Image Classification~\cite{li_2d-lda:_2005}. LD function sometimes results less then~0 or more than~1~\cite{yan_comparison_2011}.
\\ \addlinespace
LR        & Simple to implement/understand~\cite{noauthor_learn_2017}. Can be used to determine the relationship between features~\cite{noauthor_learn_2017}. Optimal when relationships are linear. Able to determine the cost of the influence of the variables~\cite{noauthor_advantages_nodate}.
& Prone to overfitting~\cite{noauthor_disadvantages_nodate-1, noauthor_learn_2017}. Very sensitive to outliers~\cite{noauthor_learn_2017}. Limited to linear relationships~\cite{noauthor_disadvantages_nodate-1}.
\\ \addlinespace
TS        & Analytics of confidence intervals~\cite{fernandes_parametric_2005}. Robust to outliers~\cite{fernandes_parametric_2005}. Very efficient when error distribution is discontinuous (distinct classes)~\cite{peng_consistency_2008}.
& Computationally complex~\cite{plot.ly_theil-sen_2015}. Loses some mathematical properties by working on random subsets~\cite{plot.ly_theil-sen_2015}. When a heteroscedastic error, biasedness is an issue~\cite{wilcox_simulations_1998}.
\\ \addlinespace
RIDGE     & Prevents overfitting~\cite{noauthor_complete_2016}. Performs well (even with highly correlated variables)~\cite{noauthor_complete_2016}. Coefficient shrinkage (reduces the model's complexity)~\cite{noauthor_complete_2016}.
& Does not remove irrelevant features, but only minimises them~\cite{chakon_practical_2017}.
\\ \addlinespace
NB        & Simple and highly scalable~\cite{hand_idiots_2001} Performs well (even with strong dependencies)~\cite{zhang_optimality_2004}.
& Can be biased~\cite{hand_idiots_2001}. Cannot learn relationships between features (assumes feature independence)~\cite{hand_idiots_2001}. Low precision and sensitivity with smaller datasets~\cite{g._easterling_point_1973}.
\\ \addlinespace
SGD       & Can be used as an efficient optimisation algorithm~\cite{noauthor_overview_2016}. Versatile and simple~\cite{bottou_stochastic_2012}. Efficient at solving large-scale tasks~\cite{zhang_solving_2004}.                                                                                                                                                                                                                                                                                                                                          
& Slow convergence rate~\cite{schoenauer-sebag_stochastic_2017}. Tuning the learning rate can be tedious and is very important~\cite{vryniotis_tuning_2013}. Sensitive to feature scaling~\cite{noauthor_disadvantages_nodate}. Requires multiple hyper-parameters~\cite{noauthor_disadvantages_nodate}.
\\ \bottomrule



