Writing data to HDF5 files can be done simply with usually sensible defaults.
However, when wanting any semblance of control over how an R object is
written out, the code constructs get complicated quickly. WriteH5Group
provides a wrapper with sensible defaults over some of these complex code
constructs to provide greater control over how data are written to disk.
These defaults were chosen to fit best with h5Seurat files, see
vignette("h5Seurat-spec")
for more
details
WriteH5Group(x, name, hgroup, verbose = TRUE) # S4 method for ANY WriteH5Group(x, name, hgroup, verbose = TRUE) # S4 method for array WriteH5Group(x, name, hgroup, verbose = TRUE) # S4 method for Assay WriteH5Group(x, name, hgroup, verbose = TRUE) # S4 method for data.frame WriteH5Group(x, name, hgroup, verbose = TRUE) # S4 method for dgCMatrix WriteH5Group(x, name, hgroup, verbose = TRUE) # S4 method for DimReduc WriteH5Group(x, name, hgroup, verbose = TRUE) # S4 method for factor WriteH5Group(x, name, hgroup, verbose = TRUE) # S4 method for Graph WriteH5Group(x, name, hgroup, verbose = TRUE) # S4 method for list WriteH5Group(x, name, hgroup, verbose = TRUE) # S4 method for logical WriteH5Group(x, name, hgroup, verbose = TRUE) # S4 method for SeuratCommand WriteH5Group(x, name, hgroup, verbose = TRUE)
x | An object |
---|---|
name | Name to save data as |
hgroup | An HDF5 file or group ( |
verbose | Show progress updates |
Invisibly returns NULL
# \donttest{ # Setup an HDF5 file hfile <- hdf5r::H5File$new(filename = tempfile(fileext = '.h5'), mode = 'a') # } # \donttest{ # Data frames are stored as either datasets or groups, depending on the # presence of factor columns df <- data.frame( x = c('g1', 'g1', 'g2', 'g1', 'g2'), y = 1:5, stringsAsFactors = FALSE ) # When no factor columns are present, the data frame is written as a single # HDF5 compound dataset WriteH5Group(x = df, name = 'df', hgroup = hfile) hfile[['df']]#> Class: H5Group #> Filename: /tmp/RtmpVp8Dsb/file1e06c235d56.h5 #> Group: /df #> Attributes: colnames #> Listing: #> name obj_type dataset.dims dataset.type_class #> x H5I_DATASET 5 H5T_STRING #> y H5I_DATASET 5 H5T_INTEGER# When factors are present, the data frame is written as a group # This is because h5py does not implement HDF5 Enums, so factor level # information would be lost df$x <- factor(x = df$x) WriteH5Group(x = df, name = 'df.factor', hgroup = hfile) hfile[['df.factor']]#> Class: H5Group #> Filename: /tmp/RtmpVp8Dsb/file1e06c235d56.h5 #> Group: /df.factor #> Attributes: colnames #> Listing: #> name obj_type dataset.dims dataset.type_class #> x H5I_GROUP <NA> <NA> #> y H5I_DATASET 5 H5T_INTEGER# } # \donttest{ # Factors turn into a group with two components: values and levels # This is to preserve level information for HDF5 APIs that don't implement # the HDF5 Enum type (eg. h5py) # values corresponds to the integer values of each member of a factor # levels is a string dataset with one entry per level fctr <- factor(x = c('g1', 'g1', 'g2', 'g1', 'g2')) WriteH5Group(x = fctr, name = 'factor', hgroup = hfile) hfile[['factor']]#> Class: H5Group #> Filename: /tmp/RtmpVp8Dsb/file1e06c235d56.h5 #> Group: /factor #> Listing: #> name obj_type dataset.dims dataset.type_class #> levels H5I_DATASET 2 H5T_STRING #> values H5I_DATASET 5 H5T_INTEGER# } # \donttest{ # Logicals get encoded as integers with the following mapping # FALSE becomes 0L # TRUE becomes 1L # NA becomes 2L # These are stored as H5T_INTEGERS instead of H5T_LOGICALS # Additionally, an attribute called "s3class" is written with the value of "logical" WriteH5Group(c(TRUE, FALSE, NA), name = "logicals", hgroup = hfile) hfile[["logicals"]]#> Class: H5D #> Dataset: /logicals #> Filename: /tmp/RtmpVp8Dsb/file1e06c235d56.h5 #> Access type: H5F_ACC_RDWR #> Attributes: s3class #> Datatype: H5T_STD_I32LE #> Space: Type=Simple Dims=3 Maxdims=Inf #> Chunk: 2048hfile[["logicals"]]$attr_open("s3class")$read()#> [1] "logical"#> [1] TRUE# }