Read and write fst files. — write

Read and write data frames from and to a fast-storage (`fst`) file. Allows for compression and (file level) random access of stored data, even for compressed datasets. Multiple threads are used to obtain high (de-)serialization speeds but all background threads are re-joined before `write_fst` and `read_fst` return (reads and writes are stable). When using a `data.table` object for `x`, the key (if any) is preserved, allowing storage of sorted data. Methods `read_fst` and `write_fst` are equivalent to `read.fst` and `write.fst` (but the former syntax is preferred).

write_fst(x, path, compress = 50, uniform_encoding = TRUE)

write.fst(x, path, compress = 50, uniform_encoding = TRUE)

read_fst(
  path,
  columns = NULL,
  from = 1,
  to = NULL,
  as.data.table = FALSE,
  old_format = FALSE
)

read.fst(
  path,
  columns = NULL,
  from = 1,
  to = NULL,
  as.data.table = FALSE,
  old_format = FALSE
)

Arguments

x: a data frame to write to disk
path: path to fst file
compress: value in the range 0 to 100, indicating the amount of compression to use. Lower values mean larger file sizes. The default compression is set to 50.
uniform_encoding: If `TRUE`, all character vectors will be assumed to have elements with equal encoding. The encoding (latin1, UTF8 or native) of the first non-NA element will used as encoding for the whole column. This will be a correct assumption for most use cases. If `uniform.encoding` is set to `FALSE`, no such assumption will be made and all elements will be converted to the same encoding. The latter is a relatively expensive operation and will reduce write performance for character columns.
columns: Column names to read. The default is to read all columns.
from: Read data starting from this row number.
to: Read data up until this row number. The default is to read to the last row of the stored dataset.
as.data.table: If TRUE, the result will be returned as a data.table object. Any keys set on dataset x before writing will be retained. This allows for storage of sorted datasets. This option requires data.table package to be installed.
old_format: must be FALSE, the old fst file format is deprecated and can only be read and converted with fst package versions 0.8.0 to 0.8.10.

Value

`read_fst` returns a data frame with the selected columns and rows. `write_fst` writes `x` to a `fst` file and invisibly returns `x` (so you can use this function in a pipeline).

Examples

# Sample dataset
x <- data.frame(A = 1:10000, B = sample(c(TRUE, FALSE, NA), 10000, replace = TRUE))

# Default compression
fst_file <- tempfile(fileext = ".fst")
write_fst(x, fst_file)  # filesize: 17 KB
y <- read_fst(fst_file) # read fst file
# Maximum compression
write_fst(x, fst_file, 100)  # fileSize: 4 KB
y <- read_fst(fst_file) # read fst file

# Random access
y <- read_fst(fst_file, "B") # read selection of columns
y <- read_fst(fst_file, "A", 100, 200) # read selection of columns and rows