IO Operations

[ ]:
import timeatlas as ta

The object TimeSeries and TimeSeries data have multiple way to be written into a file.

With TimeSeries

First we create some TimeSeries and a TimeSeriesDataset

[90]:
from timeatlas.read_write import read_text, read_pickle, read_tsd, csv_to_tsd

ts = ta.TimeSeries.create('2019-01-01', '2019-01-04', freq='1D')
ts = ts.fill([i for i in range(len(ts))])
ts.class_label = "test label"
ts.metadata = ta.Metadata({'test': "metadata test"})

ts2 = ta.TimeSeries.create('2019-01-01', '2019-01-04', freq='H')
ts2 = ts2.fill([i for i in range(len(ts2))])

ts3 = ta.TimeSeries.create('2019-01-01', '2019-01-10', freq='1D')
ts3 = ts3.fill([i for i in range(len(ts3))])

tsd = ta.TimeSeriesDataset([ts, ts2, ts3])

In time timeatlas we try to keep the TimeSeries and their Metadata close to each other. With the TimeSeries.to_text(path) we write the TimeSeries data into data.csv and the metadata into meta.json.

[91]:
ts.to_text('./data/timeseries/to_text/')

This will save the TimeSeries in a csv-files (data.csv) and the metadata (meta.json).

[92]:
ts = read_text('./data/timeseries/to_text/')
[ ]:
ts
[94]:
ts.metadata
[94]:
{'label': 'test label', 'test': 'metadata test'}

In addtion to to_text(path), TimeAtlas implements :

  • to_pickle(path) saving the TimeSeries as pickle,

  • to_df() returning a Pandas Dataframe,

  • to_array() returning a np.array,

  • to_darts() the format form u8darts.

With TimeSeriesDataset

TimeSeriesDataset implements the most of the same functions as in TimeSeries as both extends the abstract base class AbstarctBaseTimeSeries (excl. to_darts())

TimeSeriesDataset.to_text(path) will create a subfolder in path for each TimeSeries in the TimeSeriesDataset to keep the data.csv and the meta.json together.

data
├──time_series_dataset
│  ├── 0
│  │   ├── data.csv
│  │   └── meta.json
.  └── 1
.      ├── data.csv
.      └── meta.jsondata.csv
[95]:
tsd
[95]:
   minimum  maximum  mean  median  kurtosis  skewness
0        0        3   1.5     1.5      -1.2       0.0
1        0       72  36.0    36.0      -1.2       0.0
2        0        9   4.5     4.5      -1.2       0.0
[96]:
tsd.to_text('./data/timeseriesdataset/to_text/')

To load the TimeSeriesDataset will search the given folder for the subfolders and if they contain files in the acceptable format.

[97]:
tsd_loaded = read_tsd('./data/timeseriesdataset/to_text/')
[98]:
tsd_loaded
[98]:
   minimum  maximum  mean  median  kurtosis  skewness
0        0        3   1.5     1.5      -1.2       0.0
1        0       72  36.0    36.0      -1.2       0.0
2        0        9   4.5     4.5      -1.2       0.0

Additionally there is a possibility to load a csv-file as TimeSeriesDataset. Each column in the csv will be loaded as TimeSeries and added to the TimeSeriesDataset

[99]:
tsd_loaded.to_df().to_csv("./data/timeseriesdataset/tsd.csv")
[101]:
tsd_from_csv = csv_to_tsd("./data/timeseriesdataset/tsd.csv")
[105]:
tsd_from_csv.to_df()
[105]:
0_values 1_values 2_values
index
2019-01-01 00:00:00 0.0 0.0 0.0
2019-01-01 01:00:00 NaN 1.0 NaN
2019-01-01 02:00:00 NaN 2.0 NaN
2019-01-01 03:00:00 NaN 3.0 NaN
2019-01-01 04:00:00 NaN 4.0 NaN
... ... ... ...
2019-01-06 00:00:00 NaN NaN 5.0
2019-01-07 00:00:00 NaN NaN 6.0
2019-01-08 00:00:00 NaN NaN 7.0
2019-01-09 00:00:00 NaN NaN 8.0
2019-01-10 00:00:00 NaN NaN 9.0

79 rows × 3 columns

In version 0.1.1 there are few restrictions on the format of the csv:

  1. Each column has to start with a number and an underscore (eg. 0_)

  2. The integer has to be followed by “values”

The reason for these restrictions is the inclution of timestamp labels in the TimeSeries that are named “0_labels_XY”.