我的数据框如下所示:
df=data.frame(
eye_problemsdisorders_f6148_0_1=c("A","C","D",NA,"D","A","C",NA,"B","A"),
eye_problemsdisorders_f6148_0_2=c("B","C",NA,"A","C","B",NA,NA,"A","D"),
eye_problemsdisorders_f6148_0_3=c("C","A","D","D","B","A",NA,NA,"A","B"),
eye_problemsdisorders_f6148_0_4=c("D","D",NA,"B","A","C",NA,"C","A","B"),
eye_problemsdisorders_f6148_0_5=c("C","C",NA,"D","B","C",NA,"D","D","B")
实际上,我有更多的列,它们并不总是与“eye_problemsdisorders_f6148”这个字符串匹配,而且还有更多的行。
我想做的是创建一个新列,假设名为“case”,其中字符串“A”在任何列中至少出现一次的每一行都具有值“1”,如果没有,该值将为“0” 。因此,在上面的示例中,“case”列将具有以下值:1,1,0,1,1,1,0,0,1,1
答案1
给定
> df=data.frame(
+ eye_problemsdisorders_f6148_0_1=c("A","C","D",NA,"D","A","C",NA,"B","A"),
+ eye_problemsdisorders_f6148_0_2=c("B","C",NA,"A","C","B",NA,NA,"A","D"),
+ eye_problemsdisorders_f6148_0_3=c("C","A","D","D","B","A",NA,NA,"A","B"),
+ eye_problemsdisorders_f6148_0_4=c("D","D",NA,"B","A","C",NA,"C","A","B"),
+ eye_problemsdisorders_f6148_0_5=c("C","C",NA,"D","B","C",NA,"D","D","B")
+ )
然后
> f = function(x) any(x == "A", na.rm = TRUE)
>
> apply(df, MARGIN = 1, FUN = f)
[1] TRUE TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE
>
将逻辑TRUE
,FALSE
值强制转换为 numeric 1
,0
并添加为新列:
> df$case <- as.numeric(apply(df, MARGIN = 1, FUN = f))
>
>
> df
eye_problemsdisorders_f6148_0_1 eye_problemsdisorders_f6148_0_2
1 A B
2 C C
3 D <NA>
4 <NA> A
5 D C
6 A B
7 C <NA>
8 <NA> <NA>
9 B A
10 A D
eye_problemsdisorders_f6148_0_3 eye_problemsdisorders_f6148_0_4
1 C D
2 A D
3 D <NA>
4 D B
5 B A
6 A C
7 <NA> <NA>
8 <NA> C
9 A A
10 B B
eye_problemsdisorders_f6148_0_5 case
1 C 1
2 C 1
3 <NA> 0
4 D 1
5 B 1
6 C 1
7 <NA> 0
8 D 0
9 D 1
10 B 1
答案2
我将再次对简短的答案进行投票,但这里有一个:
awk '{if ($0 ~ /A/) {printf 1} else {printf 0}}' datafile
这里需要 printf,因为 awk 将打印换行符。如果您想要/需要逗号,可以添加它们。