[ACCEPTED]-Find the max date in a single column across multiple rows-date
id<-c(1,1,2,3,3)
date<-c("23-01-08","01-11-07","30-11-07","17-12-07","12-12-08")
df<-data.frame(id,date)
df$date2<-as.Date(as.character(df$date), format = "%d-%m-%y")
# aggregate can be used for this type of thing
d = aggregate(df$date2,by=list(df$id),max)
# And merge the result of aggregate
# with the original data frame
df2 = merge(df,d,by.x=1,by.y=1)
df2
id date date2 x
1 1 23-01-08 2008-01-23 2008-01-23
2 1 01-11-07 2007-11-01 2008-01-23
3 2 30-11-07 2007-11-30 2007-11-30
4 3 17-12-07 2007-12-17 2008-12-12
5 3 12-12-08 2008-12-12 2008-12-12
Edit: Since you want the last column to 4 be "empty" when the date does not match 3 the max date, you can try the next line.
df2[df2[,3]!=df2[,4],4]=NA
df2
id date date2 x
1 1 23-01-08 2008-01-23 2008-01-23
2 1 01-11-07 2007-11-01 <NA>
3 2 30-11-07 2007-11-30 2007-11-30
4 3 17-12-07 2007-12-17 <NA>
5 3 12-12-08 2008-12-12 2008-12-12
Of 2 course, it is always nice to clean up the 1 colnames, etc., but I leave that for you.
Another approach is to use the plyr
package:
library(plyr)
ddply(df, "id", summarize, max = max(date2))
# id max
#1 1 2008-01-23
#2 2 2007-11-30
#3 3 2008-12-12
Now 6 this isn't in the format you were after, as 5 it only shows each id
once. Never fear, we 4 can use transform
instead of summarize
:
ddply(df, "id", transform, max = max(date2))
# id date date2 max
#1 1 01-11-07 2007-11-01 2008-01-23
#2 1 23-01-08 2008-01-23 2008-01-23
#3 2 30-11-07 2007-11-30 2007-11-30
#4 3 12-12-08 2008-12-12 2008-12-12
#5 3 17-12-07 2007-12-17 2008-12-12
As in @seandavi's answer, this 3 repeats the max
date for each id
. If you want 2 to change the duplicates to NA
, something 1 like this will do the job:
within(ddply(df, "id", transform, max = max(date2)), max[max != date2] <- NA)
Adding dplyr
solution in case someone is looking:
library(dplyr)
df %>%
group_by(id) %>%
mutate(max = if_else(date2 == max(date2), date2, as.Date(NA)))
Result:
# A tibble: 5 x 4
# Groups: id [3]
id date date2 max
<dbl> <fctr> <date> <date>
1 1 23-01-08 2008-01-23 2008-01-23
2 1 01-11-07 2007-11-01 NA
3 2 30-11-07 2007-11-30 2007-11-30
4 3 17-12-07 2007-12-17 NA
5 3 12-12-08 2008-12-12 2008-12-12
0
library(sqldf)
tables<- '(SELECT * FROM df
)
AS t1,
(SELECT id,max(date2) date2 FROM df GROUP BY id
)
AS t2'
out<-fn$sqldf("SELECT t1.*,t2.date2 mdate FROM $tables WHERE t1.id=t2.id")
out$mdate<-as.Date(out$mdate)
out$mdate[out$date2!=out$mdate]<-NA
# id date date2 mdate
#1 1 01-11-07 2007-11-01 <NA>
#2 1 23-01-08 2008-01-23 2008-01-23
#3 2 30-11-07 2007-11-30 2007-11-30
#4 3 12-12-08 2008-12-12 2008-12-12
#5 3 17-12-07 2007-12-17 <NA>
0
You cannot use 0 as a Date value, so you 6 will either need to abandon keeping it as 5 a Date or accept a NA value:
# Date values:
df$maxdt <- ave(df$date2, df$id,
FUN=function(x) ifelse( x == max(x), as.character(x), NA) )
str(ave(df$date2, df$id, FUN=function(x) ifelse( x == max(x), as.character(x), NA) ) )
# Date[1:5], format: "2008-01-23" NA "2007-11-30" NA "2008-12-12"
The ifelse
machinery 4 does some strange type checking that defeats 3 using just x
as the second argument above, but 2 still returns Date-class vector. Go figure! Below 1 is the character vector option.
# Character values:
df$maxdt <- ave(as.character(df$date2), df$id,
FUN=function(x) ifelse( x == max(x), x, "0") )
ave(as.character(df$date2), df$id, FUN=function(x) ifelse( x == max(x), x, "0") )
[1] "2008-01-23" "0" "2007-11-30" "0" "2008-12-12"
I found this to help when I want to see 6 the min/max date of a column
Max: head(df %>% distinct(date) %>% arrange(desc(date)))
Min: head(df %>% distinct(date) %>% arrange(date))
The 5 max will sort the date column in descending 4 order, allowing you to see the max. The 3 min will sort in ascending order, allowing 2 you to see the min.
You need to use the 1 dplyr
package for this.
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.