There are several ways to represent a time index (sequence of dates or date-times) in R. Table 1 summarizes the main time index classes available in R.
Class | Package | Description |
---|---|---|
chron |
chron | Represent calendar dates and times within the day as the (signed) number of seconds since the beginning of 1970 as a numeric vector. Does not control for time zones. |
Date |
base | Represent calendar dates as the number of days since 1970-01-01 |
yearmon |
zoo | Represent monthly data. Internally it holds the data as year plus 0 for January, 1/12 for February, 2/12 for March and so on in order that its internal representation is the same as ts class with frequency = 12. |
yearqtr |
zoo | Represent quarterly data. Internally it holds the data as year plus 0 for Quarter 1, ¼ for Quarter 2 and so on in order that its internal representation is the same as ts class with frequency = 4. |
POSIXct |
base | Represent calendar dates and times within the day as the (signed) number of seconds since the beginning of 1970 as a numeric vector. Supports various time zone specifications (e.g. GMT, PST, EST etc.) |
POSIXlt |
Base | Represents local dates and times within the day as named list of vectors with date-time components. |
timeDate (Sv4) |
timeDate | The Rmetrics timeDate Sv4 class fulfils the conventions of the ISO 8601 standard as well as of the ANSI C and POSIX standards. Beyond these standards Rmetrics has added the “Financial Center” concept which allows to handle data records collected in different time zones and mix them up to have always the proper time stamps with respect to your personal financial center, or alternatively to the GMT reference time. timeDate is almost compatible with the timeDate class in Tibco's S-PLUS. |
The base R Date
class handles dates (without times), and is the recommended class for representing financial data that are observed on discrete dates without regard to the time of day (e.g., daily closing prices). The base R POSIXct
and POSIXlt
classes allow for dates and times with control for time zones. This is the recommended class for representing dates associated with financial data observed at particular times within a day (e.g., prices or quotes observed during the trading hours of a day). The chron
class is similar but is not used as often as the POSIXt
classes. The yearmon
and yearqtr
classes from the zoo package are convenient for representing regularly spaced monthly and quarterly data, respectively, when it is not necessary to specify exactly when during the month or quarter the data is observed. The Rmetrics timeDate
class is an Sv4 class very similar to the S-PLUS timeDate
class , is based on the POSIX standards, and is used throughout the Rmetrics suite of packages.
Throughout this tuturial, I will use the following R options
options(digits = 4, width = 70)
Date
Class (base R)Use the Date class to represent a time index only involving dates but not times within a day. The Date class by default represents dates internally as the number of days since January 1, 1970. You create Date objects from a character string representing a date using the as.Date() function. The default format is “YYYY/m/d” or “YYYY-m-d”“, where YYYY represents the four digit year, m represents the month digit and d represents the day digit. For example,
my.date = as.Date("1970/1/1")
my.date
## [1] "1970-01-01"
class(my.date)
## [1] "Date"
as.numeric(my.date)
## [1] 0
myDates = c("2013-12-19", "2003-12-20")
as.Date(myDates)
## [1] "2013-12-19" "2003-12-20"
Use the format argument to specify the input format of the date if it is not in the default format
as.Date("1/1/1970", format = "%m/%d/%Y")
## [1] "1970-01-01"
as.Date("January 1, 1970", format = "%B %d, %Y")
## [1] "1970-01-01"
as.Date("01JAN70", format = "%d%b%y")
## [1] "1970-01-01"
Notice that the output format is always in the form "YYYY-m-d” regardless of the input format. To change the displayed output format of a date use the format() function
format(my.date, "%b %d, %Y")
## [1] "Jan 01, 1970"
Some date formats provide insufficient information to be unambiguously represented as a Date object. For example,
as.Date("Jan 1970", format = "%b %Y")
## [1] NA
Table 2 below gives the standard date format codes.
Code | Value | Example |
---|---|---|
%d | Day of the month (decimal number) | 23 |
%m | Month (decimal number) | 11 |
%b | Month (abbreviated) | Jan |
%B | Month (full name) | January |
%y | Year (2 digit) | 90 |
%Y | Year (4 digit) | 1990 |
Recall, dates are internally recorded as the (integer) number of days since 1970-01-01. As a result, you can also create a Date object from integer data. One way to convert an integer variable to a Date object is to use the class() function
my.date = 0
class(my.date) = "Date"
my.date
## [1] "1970-01-01"
Another way is to use the as.Date() function with optional argument origin if the origin date is different than the default 1970-01-01. For example, to determine the date that is 32500 days from 1900-01-01 use
as.Date(32500, origin = as.Date("1900-01-01"))
## [1] "1988-12-25"
Consider the Date object
my.date
## [1] "1970-01-01"
Suppose I want to extract the year component from this object as a character string or as an integer. I can do this using the format() function
myYear = format(my.date, "%Y")
myYear
## [1] "1970"
class(myYear)
## [1] "character"
as.numeric(myYear)
## [1] 1970
as.numeric(format(my.date, "%Y"))
## [1] 1970
By specifying different format codes in the format() function, I can extract other components of the date such as the month or day.
Additionally, the weekdays(), months(), quarters() and julian() functions can be used to extract specific components of Date objects
weekdays(my.date)
## [1] "Thursday"
months(my.date)
## [1] "January"
quarters(my.date)
## [1] "Q1"
julian(my.date, origin = as.Date("1900-01-01"))
## [1] 25567
## attr(,"origin")
## [1] "1900-01-01"
Having a numeric representation for dates allows for some simple date arithmetic. For example,
my.date
## [1] "1970-01-01"
my.date + 1
## [1] "1970-01-02"
my.date - 1
## [1] "1969-12-31"
my.date + 31
## [1] "1970-02-01"
Logical comparisons can also be made
my.date
## [1] "1970-01-01"
my.date1 = as.Date("1980-01-01")
my.date1 > my.date
## [1] TRUE
Subtracting two Date objects creates a difftime object and shows the number of days between the two dates
diff.date = my.date1 - my.date
diff.date
## Time difference of 3652 days
class(diff.date)
## [1] "difftime"
as.numeric(diff.date)
## [1] 3652
my.date + diff.date
## [1] "1980-01-01"
Very often sequences of dates are required in the construction of time series objects. The base R function seq() (with method function seq.Date() for objects of class Date) can create many types of date sequences. The arguments to seq.Date() are
args(seq.Date)
## function (from, to, by, length.out = NULL, along.with = NULL,
## ...)
## NULL
where from specifies the starting date, to specifies the ending date and by specifies the increment of the sequence. The by increment is a character string, containing one of “day”, “week”, “month” or “year”, and can be preceded by a (positive or negative) integer and a space, or followed by “s”. For example, to create a bi-monthly sequence of Date objects starting 1993-03-01 and ending in 2003-03-01 use
my.dates = seq(as.Date("1993/3/1"), as.Date("2003/3/1"), "2 months")
head(my.dates)
## [1] "1993-03-01" "1993-05-01" "1993-07-01" "1993-09-01" "1993-11-01"
## [6] "1994-01-01"
tail(my.dates)
## [1] "2002-05-01" "2002-07-01" "2002-09-01" "2002-11-01" "2003-01-01"
## [6] "2003-03-01"
Alternatively, use
my.dates = seq(from = as.Date("1993/3/1"), by = "2 months", length.out = 61)
The seq() function can also be used to determine the date that is a specified number of days, weeks, months or years from a given date. For example, to find the date that is 5 months away from today's date use
Sys.Date()
## [1] "2014-05-13"
seq(from = Sys.Date(), by = "5 months", length.out = 2)[2]
## [1] "2014-10-13"
While the above is a clever solution, it is not very intuitive. The lubridate package, described later on, provides a much easier solution.
Given a data set of Date objects, it is possible to graphically summarize the distribution of dates using the hist() function. For example, the following code simulates 500 random dates between 2013-01-01 and 2014-01-01 and plots a histogram summarizing the number of dates within each month
rint = round(runif(500) * 365)
startDate = as.Date("2013-01-01")
myDates = startDate + rint
head(myDates)
## [1] "2013-01-30" "2013-09-06" "2013-08-31" "2013-01-18" "2013-07-10"
## [6] "2013-01-06"
hist(myDates, breaks = "months", freq = TRUE, main = "Distribution of Dates by Month",
col = "slateblue1", xlab = "", format = "%b %Y", las = 2)