如何根据与特定字符串匹配的多个列中的值创建新列?

如何根据与特定字符串匹配的多个列中的值创建新列?

我的数据框如下所示:

df=data.frame(
  eye_problemsdisorders_f6148_0_1=c("A","C","D",NA,"D","A","C",NA,"B","A"),
  eye_problemsdisorders_f6148_0_2=c("B","C",NA,"A","C","B",NA,NA,"A","D"),
  eye_problemsdisorders_f6148_0_3=c("C","A","D","D","B","A",NA,NA,"A","B"),
  eye_problemsdisorders_f6148_0_4=c("D","D",NA,"B","A","C",NA,"C","A","B"),
 eye_problemsdisorders_f6148_0_5=c("C","C",NA,"D","B","C",NA,"D","D","B")

实际上,我有更多的列,它们并不总是与“eye_problemsdisorders_f6148”这个字符串匹配,而且还有更多的行。

我想做的是创建一个新列,假设名为“case”,其中字符串“A”在任何列中至少出现一次的每一行都具有值“1”,如果没有,该值将为“0” 。因此,在上面的示例中,“case”列将具有以下值:1,1,0,1,1,1,0,0,1,1

答案1

给定

> df=data.frame(
+   eye_problemsdisorders_f6148_0_1=c("A","C","D",NA,"D","A","C",NA,"B","A"),
+   eye_problemsdisorders_f6148_0_2=c("B","C",NA,"A","C","B",NA,NA,"A","D"),
+   eye_problemsdisorders_f6148_0_3=c("C","A","D","D","B","A",NA,NA,"A","B"),
+   eye_problemsdisorders_f6148_0_4=c("D","D",NA,"B","A","C",NA,"C","A","B"),
+   eye_problemsdisorders_f6148_0_5=c("C","C",NA,"D","B","C",NA,"D","D","B")
+ )

然后

> f = function(x) any(x == "A", na.rm = TRUE)
> 
> apply(df, MARGIN = 1, FUN = f)
 [1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE
> 

将逻辑TRUE,FALSE值强制转换为 numeric 10并添加为新列:

> df$case <- as.numeric(apply(df, MARGIN = 1, FUN = f))
> 
> 
> df
   eye_problemsdisorders_f6148_0_1 eye_problemsdisorders_f6148_0_2
1                                A                               B
2                                C                               C
3                                D                            <NA>
4                             <NA>                               A
5                                D                               C
6                                A                               B
7                                C                            <NA>
8                             <NA>                            <NA>
9                                B                               A
10                               A                               D
   eye_problemsdisorders_f6148_0_3 eye_problemsdisorders_f6148_0_4
1                                C                               D
2                                A                               D
3                                D                            <NA>
4                                D                               B
5                                B                               A
6                                A                               C
7                             <NA>                            <NA>
8                             <NA>                               C
9                                A                               A
10                               B                               B
   eye_problemsdisorders_f6148_0_5 case
1                                C    1
2                                C    1
3                             <NA>    0
4                                D    1
5                                B    1
6                                C    1
7                             <NA>    0
8                                D    0
9                                D    1
10                               B    1

答案2

我将再次对简短的答案进行投票,但这里有一个:

awk '{if ($0 ~ /A/) {printf 1} else {printf 0}}' datafile

这里需要 printf,因为 awk 将打印换行符。如果您想要/需要逗号,可以添加它们。

相关内容