将所有 C 注释打印到单独的文本文件中

Question 1

已经有很多使用 shell-magic 的答案，但我认为通过使用您可能已经拥有的工具可以更容易地完成。即，海湾合作委员会。

diff -u <(gcc -fpreprocessed -dD -E main.c) main.c | grep '^+' | cut -c 2-

怎么运行的？

gcc -fpreprocessed -dD -E main.c 删除文件中的所有注释并将其放在标准输出上
diff -u <(...) main.c 从 stdout 获取输入并将其与原始数据进行比较
grep '^+' 过滤以 . 开头的所有行+。换句话说：过滤之前确定的评论
cut -c 2-+从输出中删除符号

不需要超级复杂的正则表达式、perl 或 awk 的东西，同时还涵盖其他答案可能错过的所有边缘情况。

Answer

已经有很多使用 shell-magic 的答案，但我认为通过使用您可能已经拥有的工具可以更容易地完成。即，海湾合作委员会。

diff -u <(gcc -fpreprocessed -dD -E main.c) main.c | grep '^+' | cut -c 2-

怎么运行的？

gcc -fpreprocessed -dD -E main.c 删除文件中的所有注释并将其放在标准输出上
diff -u <(...) main.c 从 stdout 获取输入并将其与原始数据进行比较
grep '^+' 过滤以 . 开头的所有行+。换句话说：过滤之前确定的评论
cut -c 2-+从输出中删除符号

不需要超级复杂的正则表达式、perl 或 awk 的东西，同时还涵盖其他答案可能错过的所有边缘情况。

Question 2

如果您考虑以下因素，这并不像看起来那么微不足道：puts("string with /*")记住"s 可以出现在中ch = '"'。

或者续行：

printf("...");    /\
* yes, this is a comment */
/\
/ and this as well

或者三字母组。

为了解决这些问题，我们可以调整这是对相反问题的回答使其打印而不是删除注释：

perl -0777 -pe '
  s{
    (?<comment>
      # /* ... */ C comments
      / (?<lc> # line continuation
          (?<bs> # backslash in its regular or trigraph form
            \\ | \?\?/
          )
          (?: \n | \r\n?) # handling LF, CR and CRLF line delimiters
        )* \* .*? \* (?&lc)* /
      | / (?&lc)* / (?:(?&lc) | [^\r\n])* # // C++/C99 comments
    ) |
       "(?:(?&bs)(?&lc)*.|.)*?" # "strings" literals
       | '\''(?&lc)*(?:(?&bs)(?&lc)*(?:\?\?.|.))?(?:\?\?.|.)*?'\'' # (w)char literals
       | \?\?'\'' # trigraph form of ^
       | .[^'\''"/?]* # anything else
  }{$+{comment} eq "" ? "" : "$+{comment}\n"}exsg'

在另一个问题中的人为示例中，涵盖了大多数极端情况：

#include <stdio.h>
int main()
{
  printf("%d %s %s %c%c%c%c%c %s %s %d\n",
  1-/* comment */-1,
  /\
* comment */
  "/* not a comment */",
  /* multiline
  comment */
  // comment
  /\
/ comment
  // multiline\
comment
  "// not a comment",
  '"' /* comment */ , '"',
  '\'','"'/* comment */,
  '\
\
"', /* comment */
  "\\
" /* not a comment */ ",
  "??/" /* not a comment */ ",
  '??''+'"' /* "comment" */);
  return 0;
}

给出：

/* comment */
/\
* comment */
/* multiline
  comment */
// comment
/\
/ comment
// multiline\
comment
/* comment */
/* comment */
/* comment */
/* "comment" */

为了获取行号，因为我们在 slurp 模式下运行，其中主题是整个输入，而不是一次处理一行输入，所以有点棘手。我们可以通过使用(?{code})正则表达式运算符在每次找到行分隔符（C 中的 CR、LF 或 CRLF）时增加计数器来做到这一点：

perl -0777 -pe '
  s{
    (?<comment>(?{$l=$n+1})
      /
      (?<lc>  # line continuation
        (?<bs> # backslash in its regular or trigraph form
          \\ | \?\?/
        ) (?<nl>(?:\n|\r\n?) (?{$n++})) # handling LF, CR and CRLF line delimiters
      )*
      (?:
        \* (?: (?&nl) | .)*? \* (?&lc)* / # /* ... */ C comments
        | / (?:(?&lc) | [^\r\n])*         # // C++/C99 comments
      )
    ) |
       "(?:(?&bs)(?&lc)*.|.)*?" # "strings" literals
       | '\''(?&lc)*(?:(?&bs)(?&lc)*(?:\?\?.|.))?(?:\?\?.|.)*?'\'' # (w)char literals
       | \?\?'\'' # trigraph form of ^
       | (?&nl)
       | .[^'\''"/?\r\n]* # anything else
  }{$+{comment} eq "" ? "" : sprintf("%5d %s\n", $l, $+{comment})}exsg'

在同一个样本上给出：

    5 /* comment */
    6 /\
* comment */
    9 /* multiline
  comment */
   11 // comment
   12 /\
/ comment
   14 // multiline\
comment
   17 /* comment */
   18 /* comment */
   21 /* comment */
   26 /* "comment" */

Answer