Read and write data frames from and to a fast-storage (fst) file. Allows for compression and (file level) random access of stored data, even for compressed datasets. Multiple threads are used to obtain high (de-)serialization speeds but all background threads are re-joined before write_fst and read_fst return (reads and writes are stable). When using a data.table object for x, the key (if any) is preserved, allowing storage of sorted data. Methods read_fst and write_fst are equivalent to read.fst and write.fst (but the former syntax is preferred).

write_fst(x, path, compress = 50, uniform_encoding = TRUE)

read_fst(path, columns = NULL, from = 1, to = NULL,
  as.data.table = FALSE, old_format = FALSE)

write.fst(x, path, compress = 50, uniform_encoding = TRUE)

read.fst(path, columns = NULL, from = 1, to = NULL,
  as.data.table = FALSE, old_format = FALSE)

Arguments

x

a data frame to write to disk

path

path to fst file

compress

value in the range 0 to 100, indicating the amount of compression to use. Lower values mean larger file sizes. The default compression is set to 50.

uniform_encoding

If TRUE, all character vectors will be assumed to have elements with equal encoding. The encoding (latin1, UTF8 or native) of the first non-NA element will used as encoding for the whole column. This will be a correct assumption for most use cases. If uniform.encoding is set to FALSE, no such assumption will be made and all elements will be converted to the same encoding. The latter is a relatively expensive operation and will reduce write performance for character columns.

columns

Column names to read. The default is to read all all columns.

from

Read data starting from this row number.

to

Read data up until this row number. The default is to read to the last row of the stored dataset.

as.data.table

If TRUE, the result will be returned as a data.table object. Any keys set on dataset x before writing will be retained. This allows for storage of sorted datasets.

old_format

use TRUE to read fst files generated with a fst package version lower than v0.8.0

Value

read_fst returns a data frame with the selected columns and rows. read_fst invisibly returns x (so you can use this function in a pipeline).

Examples

# Sample dataset x <- data.frame(A = 1:10000, B = sample(c(TRUE, FALSE, NA), 10000, replace = TRUE)) # Default compression write_fst(x, "dataset.fst") # filesize: 17 KB y <- read_fst("dataset.fst") # read fst file
#> Loading required namespace: data.table
# Maximum compression write_fst(x, "dataset.fst", 100) # fileSize: 4 KB y <- read_fst("dataset.fst") # read fst file # Random access y <- read_fst("dataset.fst", "B") # read selection of columns y <- read_fst("dataset.fst", "A", 100, 200) # read selection of columns and rows