替换 sed 命令中的奇怪字符

替换 sed 命令中的奇怪字符

我想创建一个sed命令来从给定文档中删除所有这些奇怪的字符:

sed -n 's/\|®MD-IT¯\|®MD\+BO¯\|®MDNM¯®LL\.8LI,0LI¯\|®LL0LI,0LI¯\|®MD\+IT¯\|®LL.8LI,0LI¯®MDIT¯\|®MDNM¯®FL¯®LL.8LI,0LI¯\|®FL¯®MD-BO¯\|®FL¯®MD-BO¯\|®MD-BO¯\|¯®OF1IN,1IN¯®FC¯®LL1LI,0LI¯\|\|®SF1,1¯\|®FM1FT=0LI,LR=1;\|®MDSU¯®FN1¯\|®MDNM¯¯\|®IV-RTF\|\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\|¯®BF0¯\|®FS1\|-------------------------------------\|¯®FW1\|\|//gp'

这些代码都是在另一个应用程序中创建的Nota Bene,我有许多包含此类代码的文件,我想将它们转换为纯文本,甚至可能是 Markdown。

问题是字符没有被替换。我尝试过这样做,Sublime Text并成功使用查找替换(正则表达式)剥离文档。对我来说,创建一个sed脚本比用于Sublime此任务更好。

我也尝试过使用Ed,但它也没有找到替代品。

这是在 Sublime Text 中打开时的示例 nb 文件:

®SSDEFAULTS¯®LR1¯®JU¯®MD+BO¯®UFTimes New Roman¯®SZ12Pt¯Glossary®MD+BO¯®TS.5IN,1IN,1.5IN,2IN,2.5IN,3IN,3.5IN,4IN,4.5IN,5IN,5.5IN,6IN¯    ®MD-BO¯
®NJ¯®LR1¯®LL.5LI,0LI¯®MD+BO¯®LL0LI,0LI¯®MDNM¯®LR1¯®LL.5LI,0LI¯A fortiori proposition: If X is true, then how much greater is Y true? To move logically from a stronger argument to establish a weaker argument. The weaker argument is sometimes presented by the speaker as the stronger argument.
®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯Accusative of motion/direction - Indicates movement to the noun marked by the accusative and is to be distinguished from the accusative of local determination which indicates location without motion (Joüon and Muraoka 2006, 428).
Anadiplosis - A figure of speech in which the word that a colon ends with, or a like sounding word, is the word that begins the next colon ®GC|CI:R#=47;AU=Brown, Raymond E.;YR=1990;TI=New Jerome biblical commentary;PG=245;XT=;F[=;F]=;F#=;ID=;XX=Print;CT=;FL=¯(Brown, Fitzmyer, Murphy, et al. 1990, 245)®GC¯.
®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯Anaphoric use of the article - When the article is used to indicate that the word to which it is attached is the one previously mentioned (Williams and Beckman 2007, 36). 
®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯Anaptyxis - The insertion of a vowel into a word to avoid a consonant cluster.
®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯Aoristic perfect - I use the phrase 'aoristic perfect' to refer to one of the ways the qatal form can be rendered into English. Aoristic perfect denotes a past situation the implications of which are no longer felt in the present. The situation may have extended over a period of time and it may have occurred more than once. It may have occurred in the recent or distant past but from the standpoint of the speaker it is to be regarded as a fact having occurred and hence as a fact belonging to the past (Joüon and Muraoka 2006, 337; Driver 1998, 12). The term 'aoristic perfect' and indeed the other categorizations of perfect in this grammar, all relate to the interpretation of qatal verbs in their given contexts. The qatal form in and of itself does not convey these meanings. 
®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯Beth essentiae - ®LAHebrew¯ÿHá®LAEnglish¯ that is used to indicate the predicate of a clause or a word used predicatively (Joüon and Muraoka 2006, 458).

这就是我希望文本的读法:

Glossary    
A fortiori proposition: If X is true, then how much greater is Y true? To move logically from a stronger argument to establish a weaker argument. The weaker argument is sometimes presented by the speaker as the stronger argument.
Accusative of motion/direction - Indicates movement to the noun marked by the accusative and is to be distinguished from the accusative of local determination which indicates location without motion (Joüon and Muraoka 2006, 428).
Anadiplosis - A figure of speech in which the word that a colon ends with, or a like sounding word, is the word that begins the next colon (Brown, Fitzmyer, Murphy, et al. 1990, 245).
Anaphoric use of the article - When the article is used to indicate that the word to which it is attached is the one previously mentioned (Williams and Beckman 2007, 36). 
Anaptyxis - The insertion of a vowel into a word to avoid a consonant cluster.
Aoristic perfect - I use the phrase 'aoristic perfect' to refer to one of the ways the qatal form can be rendered into English. Aoristic perfect denotes a past situation the implications of which are no longer felt in the present. The situation may have extended over a period of time and it may have occurred more than once. It may have occurred in the recent or distant past but from the standpoint of the speaker it is to be regarded as a fact having occurred and hence as a fact belonging to the past (Joüon and Muraoka 2006, 337; Driver 1998, 12). The term 'aoristic perfect' and indeed the other categorizations of perfect in this grammar, all relate to the interpretation of qatal verbs in their given contexts. The qatal form in and of itself does not convey these meanings. 
|> sed -n l Glossary.NB
\256SSDEFAULTS\257\256LR1\257\256JU\257\256MD+BO\257\256UFTimes New R\
oman\257\256SZ12Pt\257Glossary\256MD+BO\257\256TS.5IN,1IN,1.5IN,2IN,2\
.5IN,3IN,3.5IN,4IN,4.5IN,5IN,5.5IN,6IN\257\t\256MD-BO\257\r$
\256NJ\257\256LR1\257\256LL.5LI,0LI\257\256MD+BO\257\256LL0LI,0LI\257\
\256MDNM\257\256LR1\257\256LL.5LI,0LI\257A fortiori proposition: If X\
 is true, then how much greater is Y true? To move logically from a s\
tronger argument to establish a weaker argument. The weaker argument \
is sometimes presented by the speaker as the stronger argument.\r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Accusative of motion/direction - Indicates mov\
ement to the noun marked by the accusative and is to be distinguished\
 from the accusative of local determination which indicates location \
without motion (Jo\374on and Muraoka 2006, 428).\r$
Anadiplosis - A figure of speech in which the word that a colon ends \
with, or a like sounding word, is the word that begins the next colon\
 \256GC|CI:R#=47;AU=Brown, Raymond E.;YR=1990;TI=New Jerome biblical \
commentary;PG=245;XT=;F[=;F]=;F#=;ID=;XX=Print;CT=;FL=\257(Brown, Fit\
zmyer, Murphy, et al. 1990,\240245)\256GC\257.\r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Anaphoric use of the article - When the articl\
e is used to indicate that the word to which it is attached is the on\
e previously mentioned (Williams and Beckman 2007, 36). \r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Anaptyxis - The insertion of a vowel into a wo\
rd to avoid a consonant cluster.\r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Aoristic perfect - I use the phrase 'aoristic \
perfect' to refer to one of the ways the qatal form can be rendered i\
nto English. Aoristic perfect denotes a past situation the implicatio\
ns of which are no longer felt in the present. The situation may have\
 extended over a period of time and it may have occurred more than on\
ce. It may have occurred in the recent or distant past but from the s\
tandpoint of the speaker it is to be regarded as a fact having occurr\
ed and hence as a fact belonging to the past (Jo\374on and Muraoka 20\
06, 337; Driver 1998, 12). The term 'aoristic perfect' and indeed the\
 other categorizations of perfect in this grammar, all relate to the \
interpretation of qatal verbs in their given contexts. The qatal form\
 in and of itself does not convey these meanings. \r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Beth essentiae - \256LAHebrew\257\377H\341\256\
LAEnglish\257 that is used to indicate the predicate of a clause or a\
 word used predicatively (Jo\374on and Muraoka 2006, 458).\r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Classic perfect - I use the phrase 'classic pe\
rfect' to refer to one of the ways the qatal form can be rendered int\
o English. Classic perfect refers to the continuing present relevance\
 of a past situation from the perspective of the speaker (Comrie 1976\
, 52). By perfect I do not necessarily imply that a previous situatio\
n has resulted in a state but that the situation has implications rel\
evant to the present. The situation is not merely past and over but s\
omehow persists and continues to intrude into the present. Such verbs\
 are usually translated into English using the perfect or present ten\
se. I have included under this definition quasi-stative verbs which r\
efer to attributes which were acquired before, but which are assumed \
to continue in some way up to the present moment (Driver 1998, 11; Jo\
\374on and Muraoka 2006, 333; Waltke and O'Connor 1990, 487). In some\
 grammars these are treated separately. However, that creates too man\
y functions for the one perfect form. The term 'classic perfect' and \
indeed the other categorizations of perfect in this grammar all relat\
e to the \256MD+IT\257interpretation \256MD-IT\257of qatal verbs in t\
heir given contexts. The qatal form by itself does not convey these m\
eanings.\r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Cohortative of praise. The cohortative is ofte\
n used in Psalms to indicate that praise, freely undertaken, has begu\
n. This usage is close to the cohortative of resolve but not identica\
l with it. The emphasis falls not on what the writer is intending to \
do, but what he has already undertaken. \r$
Cohortative of resolve - The cohortative mood normally expresses the \
will of the speaker, but when the speaker has the ability to carry ou\
t what he wants it takes on the coloring of resolve (Van der Merwe et\
 al. 1997, 152; Waltke and O'Connor 1990, 573).\r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Concluding \256LAHebrew\257\377h\353\377H\351\
\256LAEnglish\257 - A special use of the word \256LAHebrew\257\377h\
\353\377H\351\256LAEnglish\257 found towards the end of several Psalm\
s and approximating in meaning to: the conclusion of the matter is th\
at\205\r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Conjunctive waw - Waw used to connect clauses \

答案1

Sed 也可以用作脚本(更容易开发):创建一个文件“nb2txt”

#!/usr/bin/sed -Ef

s/®[^¯]*¯//g
s/-{20,}//g
s/\.{20,}//g

和:

$ chmod 755 nb2txt
$ nb2txt file.nb

答案2

您的正则表达式使用\|(GNU 中的替代模式sed,大多数其他实现中的文字 bar sed) 和\+(GNU 中的一次或多次出现,大多数其他实现中的sed文字)。如果您使用 GNU ,此模式将删除任何类似或 的模式。如果您使用不同的实现,它可能找不到任何匹配项。+sedsed®MD-IT¯®MDDDDDBO¯sed

sed更好地使用扩展正则表达式,多年来大多数版本都支持:

sed -nE 's/®MD-IT¯|®MD+BO¯|®MDNM¯®LL\.8LI,0LI¯|®LL0LI,0LI¯|… and so on

我还建议删除空的替代方案(\|在模式的开始和结束处),尽管它们在这种情况下不会造成损害。

无穷无尽的\.\.\.\.\.\.\.\.\.\.\.\.and----应替换为\.{42}实际-{23}数量的点或破折号。或者可能是通过\-{10,}消除任何出现的 10 个或更多点。

答案3

sed -n l列表中可以清楚地看出,您有一个包含许多内容的文件字符 174(十进制或八进制 256)和 [字符 175](十进制)或 257(八进制)。列为 和\256,如果解释为“单字节”字符,\257则可以解释为 Unicode \xae(十六进制代码ae- 或256八进制)或只是,如果解释为单个字符,则可以解释为 Unicode (十六进制代码- 或八进制)或只是字节字符,®\xafaf257¯

$ printf '\256 \257 \n' | iconv -f WINDOWS-1252 -t utf8
® ¯

如果使用 utf8 作为默认编码(Linux 中常用)。

这似乎start和文件end的一些内部编码有关.nb。删除以 开头\xae和结尾的字符串\xaf似乎可以让我们更接近您的请求:

$ sed 's/®[^¯]*¯//g' test
Glossary    
A fortiori proposition: If X is true, then how much greater is Y true? To move logically from a stronger argument to establish a weaker argument. The weaker argument is sometimes presented by the speaker as the stronger argument.
Accusative of motion/direction - Indicates movement to the noun marked by the accusative and is to be distinguished from the accusative of local determination which indicates location without motion (Joüon and Muraoka 2006, 428).
Anadiplosis - A figure of speech in which the word that a colon ends with, or a like sounding word, is the word that begins the next colon (Brown, Fitzmyer, Murphy, et al. 1990, 245).
Anaphoric use of the article - When the article is used to indicate that the word to which it is attached is the one previously mentioned (Williams and Beckman 2007, 36). 
Anaptyxis - The insertion of a vowel into a word to avoid a consonant cluster.
Aoristic perfect - I use the phrase 'aoristic perfect' to refer to one of the ways the qatal form can be rendered into English. Aoristic perfect denotes a past situation the implications of which are no longer felt in the present. The situation may have extended over a period of time and it may have occurred more than once. It may have occurred in the recent or distant past but from the standpoint of the speaker it is to be regarded as a fact having occurred and hence as a fact belonging to the past (Joüon and Muraoka 2006, 337; Driver 1998, 12). The term 'aoristic perfect' and indeed the other categorizations of perfect in this grammar, all relate to the interpretation of qatal verbs in their given contexts. The qatal form in and of itself does not convey these meanings. 
Beth essentiae - ÿHá that is used to indicate the predicate of a clause or a word used predicatively (Joüon and Muraoka 2006, 458).

相关内容