The DateTimeIndex shared by all the time series.
Collects the RDD as a local TimeSeries
Returns a TimeSeriesRDD where each time series is differenced with the given order.
Returns a TimeSeriesRDD where each time series is differenced with the given order. The new RDD will be missing the first n date-times.
Fills in missing data (NaNs) in each series according to a given imputation method.
Fills in missing data (NaNs) in each series according to a given imputation method.
"linear", "nearest", "next", or "previous"
A TimeSeriesRDD with missing observations filled in.
Keep only time series whose last observation is after or equal to the given end date.
Keep only time series whose first observation is before or equal to the given start date.
Finds a series in the TimeSeriesRDD with the given key.
The DateTimeIndex shared by all the time series.
Lags each time series in the RDD
Lags each time series in the RDD
type of keys
maximum Lag
include original time series
function to generate lagged keys
RDD of lagged time series
Applies a transformation to each time series and returns a TimeSeriesRDD with the given index.
Applies a transformation to each time series and returns a TimeSeriesRDD with the given index. The caller is expected to ensure that the time series produced line up with the given index.
Applies a transformation to each time series that preserves the time index of this TimeSeriesRDD.
Returns a TimeSeriesRDD where each time series is quotiented with the given order.
Returns a TimeSeriesRDD where each time series is quotiented with the given order. The new RDD will be missing the first n date-times.
Return a TimeSeriesRDD with all instants removed that have a NaN in one of the series.
Returns a return rate series for each time series.
Returns a return rate series for each time series. Assumes periodic (as opposed to continuously compounded) returns.
Writes out the contents of this TimeSeriesRDD to a set of CSV files in the given directory, with an accompanying file in the same directory including the time index.
Gets stats like min, max, mean, and standard deviation for each time series.
Returns a TimeSeriesRDD that's a sub-slice of the given series.
Returns a TimeSeriesRDD that's a sub-slice of the given series.
The start date the for slice.
The end date for the slice (inclusive).
Returns a TimeSeriesRDD that's a sub-slice of the given series.
Returns a TimeSeriesRDD that's a sub-slice of the given series.
The start date the for slice.
The end date for the slice (inclusive).
Converts a TimeSeriesRDD into a distributed IndexedRowMatrix, useful to take advantage of Spark MLlib's statistic functions on matrices in a distributed fashion.
Converts a TimeSeriesRDD into a distributed IndexedRowMatrix, useful to take advantage of Spark MLlib's statistic functions on matrices in a distributed fashion. This is only supported for cases with a uniform time series index. See http://spark.apache.org/docs/latest/mllib-data-types.html for more information on the matrix data structure
number of partitions, default to -1, which represents the same number as currently used for the TimeSeriesRDD
an equivalent IndexedRowMatrix
Essentially transposes the time series matrix to create an RDD where each record contains a single instant in time and all the values that correspond to it.
Essentially transposes the time series matrix to create an RDD where each record contains a single instant in time and all the values that correspond to it. Involves a shuffle operation.
In the returned RDD, the ordering of values within each record corresponds to the ordering of the time series records in the original RDD. The records are ordered by time.
Performs the same operations as toInstants but returns a DataFrame instead.
Performs the same operations as toInstants but returns a DataFrame instead.
The schema of the DataFrame returned will be a java.sql.Timestamp column named "instant" and Double columns named identically to their keys in the TimeSeriesRDD
Returns a DataFrame where each row is an observation containing a timestamp, a key, and a value.
Converts a TimeSeriesRDD into a distributed RowMatrix, note that indices in a RowMatrix are not significant, and thus this is a valid operation regardless of the type of time index.
Converts a TimeSeriesRDD into a distributed RowMatrix, note that indices in a RowMatrix are not significant, and thus this is a valid operation regardless of the type of time index. See http://spark.apache.org/docs/latest/mllib-data-types.html for more information on the matrix data structure
an equivalent RowMatrix
Returns a TimeSeriesRDD rebased on top of a new index.
Returns a TimeSeriesRDD rebased on top of a new index. Any timestamps that exist in the new index but not in the existing index will be filled in with NaNs.
The DateTimeIndex for the new RDD
(Since version 1.0.0) use mapPartitionsWithIndex and filter
(Since version 1.0.0) use mapPartitionsWithIndex and flatMap
(Since version 1.0.0) use mapPartitionsWithIndex and foreach
(Since version 1.2.0) use TaskContext.get
(Since version 0.7.0) use mapPartitionsWithIndex
(Since version 1.0.0) use mapPartitionsWithIndex
(Since version 1.0.0) use collect
A lazy distributed collection of univariate series with a conformed time dimension. Lazy in the sense that it is an RDD: it encapsulates all the information needed to generate its elements, but doesn't materialize them upon instantiation. Distributed in the sense that different univariate series within the collection can be stored and processed on different nodes. Within each univariate series, observations are not distributed. The time dimension is conformed in the sense that a single DateTimeIndex applies to all the univariate series. Each univariate series within the RDD has a String key to identify it.