R 中的解决方法

R 中的解决方法

当超过 100 列时,我想在代码行#'构成块的一部分(以#'( ) 开头的行)时中断代码行(添加 )。#\x27

我的解决方案不适用于多个块:

示例文件:

#' chunk line
#' big chunk line to split big chunk line to split big chunk line to split big chunk line to split big chunk line to split
#' ruler90123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
#'
not chunk line do nothing

big do nothing line big do nothing line big do nothing line big do nothing line big do nothing line big do nothing line big do nothing line

#' chunk line
#' big chunk line to split big chunk line to split big chunk line to split big chunk line to split big chunk line to split
#' ruler90123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
#'
not chunk line do nothing

big do nothing line big do nothing line big do nothing line big do nothing line big do nothing line big do nothing line big do nothing line

我的尝试:(如果只有一个块存在则有效)

perl -0777 -pe '
  s{#\x27.*#\x27}{                          q{ gets lines from #\x27 to #\x27 (chunk) };
    ($r = $&) =~ s/\n!\n#\x27//g;           q{ removes breaks except followed by #\x27 }; 
    $r =~ s/\G.{0,100}(\s|.$)\K/\n#\x27 /g; q{ before column 100 adds break + #\x27 };
    $r =~ s/#\x27 #\x27/#\x27/g;            q{ removes duplicated #\x27 };
    $r =~ s/\n\n/\n/g;                      q{ removes duplicated breaks };
    $r
  }gse' < chunks.txt

预期输出:(两倍)

#' chunk line
#' big chunk line to split big chunk line to split big chunk line to split big chunk line to split
#' big chunk line to split
#' ruler90123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
#'
not chunk line do nothing

big do nothing line big do nothing line big do nothing line big do nothing line big do nothing line big do nothing line big do nothing line

R 中的解决方法

psum <- function(...,na.rm=FALSE) {
  rowSums(do.call(cbind,list(...)),na.rm=na.rm)
}    

gblines<-readLines("chunks.txt")

newgblines<-character()
i<-1
j<-1
repeat {
  newgblines[j] <- gblines[i]
    if (grepl("^#\'",newgblines[j] ) & nchar( newgblines[j] ) > 100 ) { # select lines with more than 100 and beginning in #'
      repeat{
        greps<-gregexpr(pattern ="\\s",newgblines[j])[[1]] # get position of spaces
        lenG<-length(greps)
        sums<-psum(-greps , rep(100,lenG ) )               # calculate which space is closest to col. 100
        index <- which(sums>0)
        minSums<- min(sums[index])
        index2<-which(sums==minSums)                       # index of space in greps
        cutpoint<-greps[index2]
        nchar2<-nchar(newgblines[j])                       # number of chars. in line
        strFirst <-substr(newgblines[j],1,cutpoint)        # cut before col. 100
        strSecond<-substr(newgblines[j],cutpoint+1,nchar2) # segmente after col. 100
        newgblines[j]<-strFirst
        j<-j+1
        newgblines[j]<-paste0("#\' ",strSecond)
        if (nchar(strSecond)<=100 ){
          break
        }
      } # 
    } #  if
  i <- i+1
  j <- j+1
  if (i>length(gblines) ){
    break
  }
}
newgblines

答案1

你就快到了。

进行以下两项更改:

  • 改变
    s{#\x27.*#\x27}{
    
    s{#\x27.*?#\x27$}{
    
  • 并改变
    }gse' < fileName
    
    }mesg' < fileName
    

基本上你是在做一次贪婪的搜索和替换。而您需要的是面向块的搜索替换操作。

使用#' 在其右侧看到换行符的标记是块结束,正.*?则表达式是非贪婪版本.*

更多详情请参阅Perl 文档

答案2

避免使用以 #' 结尾的块的替代通用答案它并不完美,但效果更好

perl -0777 -pe '
q{ 4 manual entries };
  $max_length = 100;
  $line_filter_pattern = "#\x27 ";                  
  $prefix_pattern = "#\x27 ";         
  $break_point = " ";                               q{ character in which to break lines };
  
  $linebreak_prefix = "\n$prefix_pattern";          q{ \n is linebreak };
  $lp_length = length($linebreak_prefix);

q{act in lines with prefix pattern };

  s{$line_filter_pattern.*?$}{
    ($r2 = $r = $&);

q{    check if splitting makes changes };
      $r2 =~ s/\G.{0,$max_length}($break_point|.$)\K/$linebreak_prefix/gs;
      if(length($r2) > length($r) + $lp_length) {

q{      add breaks and prefixes in a loop way };
        $r = $r2;
        $r =~ s/$linebreak_prefix$//g;
      }
  $r }gsem' < input.file > output.file

相关内容