多语言文档中不需要的分页符

2024-5-25 • tag-icon
我正在写一篇包含阿拉伯语和英语文本的论文。我完成了，但我在另一页上遇到了不必要的分页符。底部页面的近四分之一仍是空白的。这些分页符发生在 (1) 枚举环境的项目之间，以及 (2) 段落中换行符跳转到新页面的某个地方。
\documentclass[12pt]{report}
% margins
\usepackage[top=1.0in, bottom=1.0in, left=1.5in, right=1.0in]{geometry}
%\usepackage[top=1.0in, bottom=1.2in, left=1.7in, right=1.0in]{geometry}
% double spacing
\usepackage{setspace}
\doublespacing
\usepackage[table]{xcolor}
% graphics and subfigures
\usepackage{graphicx}
\usepackage[captionskip=8pt, nearskip=10pt]{subfig}
\usepackage{float}
\usepackage{lscape}
\usepackage{tikz}
\usepackage{multirow}
\usepackage[english]{babel}
\usepackage{arabtex}
\usepackage{utf8}
\setcode{utf8}
\usepackage[UTF8]{ctex}
% citations
\usepackage{cite}
% abbreviations
\usepackage{nomencl}
\makenomenclature
% math
\usepackage{amsmath}
\usepackage{amssymb}
%\usepackage{enumerate}
%\usepackage{enumitem}
\usepackage{float}
\usepackage{lmodern}
\usepackage{array}
\usepackage{longtable}
% algorithms
\usepackage{algorithm}
\usepackage{algorithmic}
\parindent=0pt
%\renewcommand{\listalgorithmname}{LIST OF ALGORITHMS}
%\renewcommand{\thealgorithm}{\thechapter.\arabic{algorithm}}
%\newcommand{\listofalgorithmsbreak}{\addtocontents{loa}{\protect\vspace{8pt}}}
\usepackage{titlesec}
\setcounter{secnumdepth}{4}
%\setcounter{tocdepth}{4}
\titleformat{\paragraph}
{\normalfont\normalsize\bfseries}{\theparagraph}{1em}{}
\titlespacing*{\paragraph}{0pt}{3.25ex plus 1ex minus .2ex}{1.5ex plus .2ex}
%definitions
\newtheorem {myDef} {Definition}
\newtheorem{myEx}{Example}
%\newcommand{\tabincell}[2]{\begin{tabular}{@{}#1@{}}#2\end{tabular}}
% capitalize names of sections
\def\contentsname{TABLE OF CONTENTS}
\def\listtablename{LIST OF TABLES}
\def\listfigurename{LIST OF FIGURES}
\def\bibname{REFERENCES}
% section headers
%
% http://www.latex-community.org/forum/viewtopic.php+f=5&t=1270
%
%\makeatletter
%
%\renewcommand*\@makechapterhead[1]{%
%  \vspace*{50\p@}%
%  {\parindent \z@ \centering \normalfont
%    \huge\bfseries
%    \ifnum \c@secnumdepth >\m@ne
%         \thechapter.\space
%    \fi
%    #1\par\nobreak
%    \vskip 36\p@
%  }}
%\makeatother
%
%\makeatletter
%\renewcommand*\@makeschapterhead[1]{%
%  \vspace*{50\p@}%
%  {\parindent \z@ \centering \normalfont
%    \huge\bfseries
%    \ifnum \c@secnumdepth >\m@ne
%    \fi
%    #1\par\nobreak
%    \vskip 36\p@
%  }}
%\makeatother
\definecolor{orange1}{RGB}{250,50,10}
\begin{document}
\begin{enumerate}
\item[A-] Tokenization: tokenization process is among the widely adopted methods in sentiment analysis projects where the text is divided into individual token of words (terms) \cite{Awajan2018}, characters and phrases \cite{Mansour2017} in the process of sentence segmentation \cite{Awajan2018}.
\item[B-] Case Normalization: in case normalization, the entire sentences or documents are converted into lowercase or vice versa \cite{Joshi2014}. Moreover, in Arabic language, there are several characters that could come in various shapes (for instance, Taa Marboutah “\RL{ة}” and Haa Marboutah “\RL{ه}”). Consequently, this stage addresses the normalization of the characters’ spelling \cite{Awajan2018}. In English language, if all the characters in a certain term are in capital, this could reflect a strong emotion or sentiment \cite{Mansour2017}.
\item[C-] Exclusion of Foreigh Letters: in English langusage, that data that does not include the letters of \{A-Z, a-z\} should be excluded \cite{Awajan2018}.
\item[D-] Stop Word Removal: in many cases the stop words are useless for the processing. Hence, they are discarded because it saves space and time \cite{Awajan2018}. Moreover, an example of Arabic stop words is ‘fe’,’lan’,’kan’ \RL{في, لن, كان}. The benefits from removing these words are to enhance effectiveness, enhance response time and decrease index space. However, a one unified list of stop words that should be deleted does not exist yet \cite{Mansour2017}.
\item[E-] Handling Negation: through the use of special words (such as not, no, never and so forth) the sentiment polarity is transformed from negative to positive or vice-versa \cite{Awajan2018}.
\item[F-] Acronyms Expansion: the acronyms are expanded to their original terms via a dictionary of acronym \cite{Awajan2018}.
\item[G-] Spelling Checking: the reviews and social media include data that has various mistakes in spelling as extra or missing letters which should be corrected \cite{Mansour2017}.  
\item[H-] Replacing Characters: the benefit gained from replacing the characters forms with different forms is to enhance the prediction accuracy. For instance, in Arabic language, there is some differences in the letters as follows \cite{Mansour2017}:
\item[-] Hamza: the hamza letter “\RL{ء}” is interchangeable based on the position and the word. It contains .\RL{ئ, ؤ, أ}
\item[-] Ta Marboutah is some times used interchangeably with the ha \RL{ه}.
\item[-] Alef: there are multiple forms of this character in Arabic language that are interchangeably used, which are \RL{ا, أ, إ}. Moreover, Alef Maqsourah \RL{ى} is some times mistakenly written as \RL{ي}.
\item[I-] Identify or Delete Punctuations: deletion of punctuations that include commas and stop words because they are not needed to identify the polarity of the text. In some situations, the punctuations could reflect the polarity of the text, for instance, the question mark that reflect perplexity or the exclamation mark that reflect a strong feeling \cite{Mansour2017} (anger, delight, wonder and so forth \cite{Maryland}) in the text.
\item[J-] Stemming or Lemmatizing: stemming process targets reducing related tokens to be of one sort. The common stemming process includes the identification and removal of suffixes, prefixes and inappropriate pluralization \cite{Joshi2014}. There is light stemming, statistical stemmer, root-based stemming and a hybrid technique \cite{Awajan2018}. On the other hand, lemmatization works through identifying the basic form “lemma” to every inflected term form in a given sentence. The advantages of lemmatization match those of stemming \cite{Laurikkala2004}. For instance, the term inflations as going, goes and gone will be stemmed to “go” but the mapping of the term “went” will not be to “go”. On the other hand, the term “went” will be lemmatized as the lemma “go”.  Furthermore, the following explanation illustrates stemming and lemmatizing \cite{Jivani2011}:
\item[K-] Stemming: introduces, introducing, introduction \begin{tikzpicture}
\draw[->,>=stealth,line width = 3pt] (0,0) -- (1.5,0);
\end{tikzpicture}  introduc \\
Goes, going, gone \begin{tikzpicture}
\draw[->,>=stealth,line width = 3pt] (0,0) -- (1.5,0);
\end{tikzpicture} go \\
Lemmatizing: introduces, introducing, introduction \begin{tikzpicture}
\draw[->,>=stealth,line width = 3pt] (0,0) -- (1.5,0);
\end{tikzpicture} introduce\\
Goes, going, gone \begin{tikzpicture}
\draw[->,>=stealth,line width = 3pt] (0,0) -- (1.5,0);
\end{tikzpicture} went, go
\item[L-] Filtering: through deleting irrelevant data such as emoticons, special words, repeated letters, URL links, user names and so forth \cite{Awajan2018}.
\end{enumerate}
\end{document}
答案1

我找到了一个使用 minipage 解决分页问题的方法，但显然这不是最好的解决方案。如果有更好的解决方案，请分享。
答案1

相关内容