Papers
Topics
Authors
Recent
Search
2000 character limit reached

CSC Format for Sparse Matrices

Updated 26 December 2025
  • CSC Format is a sparse matrix representation using three arrays (val, rowidx, colptr) that enable efficient column-oriented access and traversal.
  • Extensions such as VCSC and IVCSC leverage redundancy in column data to reduce memory usage significantly, with reductions up to 7.5× in high-redundancy scenarios.
  • CSC supports fast operations like SpMV with O(nnz) time complexity, balancing storage requirements with computational performance in scientific and numerical applications.

The Compressed Sparse Column (CSC) format is a canonical data structure for representing sparse matrices in memory, particularly designed to optimize storage efficiency and computational performance for column-oriented access patterns. Widely adopted in numerical linear algebra and scientific computing, CSC is foundational for high-performance sparse matrix-vector and matrix-matrix operations. Extensions of CSC, such as Value-Compressed Sparse Column (VCSC) and Index- and Value-Compressed Sparse Column (IVCSC), have been developed to address scenarios with additional data redundancy, relevant in domains like genomics and recommender systems (Ruiter et al., 2023).

1. CSC Format: Structure and Semantics

CSC encodes a sparse m×nm \times n matrix AA with nnznnz nonzero elements using three contiguous arrays:

  • val[0…nnz−1]\texttt{val}[0\ldots nnz-1]: the nonzero values, stored column-wise.
  • rowidx[0…nnz−1]\texttt{rowidx}[0\ldots nnz-1]: the corresponding row indices for each value.
  • colptr[0…n]\texttt{colptr}[0\ldots n]: a pointer array marking the boundaries of each column in val\texttt{val} and rowidx\texttt{rowidx}, where colptr[j]\texttt{colptr}[j] is the start index in column jj and AA0 (Ruiter et al., 2023).

Example:

Given

AA1

with AA2, CSC arrays are:

  • AA3
  • AA4
  • AA5

This structure enables efficient traversal and access for operations (e.g., SpMV) and balances storage needs and random-access efficiency.

2. Storage Analysis and Memory Footprint

Memory usage in CSC is determined by the data types used to represent matrix entries and indices:

AA6

where AA7 is the byte size per value (e.g., 8 bytes for double), and AA8 is the byte size per index (e.g., 4 bytes for 32-bit integer) (Ruiter et al., 2023).

The first two terms account for storage of nonzero values and their indices, while the final term encodes the AA9 column pointers. Compared with the Coordinate (COO) format, which uses nnznnz0, CSC reduces storage overhead by omitting explicit storage of column indices per nonzero.

3. Computational Operations and Complexity

CSC is optimized for column-access patterns, particularly efficient for:

Iteration proceeds column-by-column. val\texttt{val}4 This yields nnznnz1 time complexity, with contiguous memory access patterns for nnznnz2 and nnznnz3, advantageous for cache efficiency (Ruiter et al., 2023).

  • Element Lookup nnznnz4:

Requires scanning nnznnz5 in nnznnz6 for nnznnz7, yielding nnznnz8 per lookup, where nnznnz9 is the count of nonzeros in column val[0…nnz−1]\texttt{val}[0\ldots nnz-1]0.

4. Extensions: VCSC and IVCSC

Conventional CSC captures only sparsity, not redundancy in the value distribution within matrix columns. Two recent extensions have addressed this limitation (Ruiter et al., 2023):

  • Value-Compressed Sparse Column (VCSC):
    • val[0…nnz−1]\texttt{val}[0\ldots nnz-1]2: unique nonzero values.
    • val[0…nnz−1]\texttt{val}[0\ldots nnz-1]3: occurrence counts for each unique value.
    • val[0…nnz−1]\texttt{val}[0\ldots nnz-1]4: row indices partitioned by value group.
    • Offsets to flatten val[0…nnz−1]\texttt{val}[0\ldots nnz-1]5 arrays.

The memory usage is:

val[0…nnz−1]\texttt{val}[0\ldots nnz-1]6

where val[0…nnz−1]\texttt{val}[0\ldots nnz-1]7 is the count of unique values in column val[0…nnz−1]\texttt{val}[0\ldots nnz-1]8. When redundancy val[0…nnz−1]\texttt{val}[0\ldots nnz-1]9 is large, this achieves substantial compression.

  • Index- and Value-Compressed Sparse Column (IVCSC):
    • Positive deltas rowidx[0…nnz−1]\texttt{rowidx}[0\ldots nnz-1]0 are stored using the minimal possible byte-width.
    • Groups are prefixed with a per-group byte width, minimizing index storage to the theoretical minimum, especially when deltas are small.

In practice, IVCSC reaches up to rowidx[0…nnz−1]\texttt{rowidx}[0\ldots nnz-1]1 memory reduction over CSC in redundancy-dominated matrices (e.g., genomics data, mean redundancy ratio MMR rowidx[0…nnz−1]\texttt{rowidx}[0\ldots nnz-1]2) (Ruiter et al., 2023).

5. Compression Ratios and Empirical Results

Compression capability of CSC, VCSC, and IVCSC formats is strongly influenced by the redundancy of nonzero values:

  • In high-redundancy regimes (column-wise repeated values), VCSC reduces memory usage by up to rowidx[0…nnz−1]\texttt{rowidx}[0\ldots nnz-1]3 vs. CSC, IVCSC by up to rowidx[0…nnz−1]\texttt{rowidx}[0\ldots nnz-1]4.
  • On a rowidx[0…nnz−1]\texttt{rowidx}[0\ldots nnz-1]5 nonzero single-cell genomics matrix (MMR rowidx[0…nnz−1]\texttt{rowidx}[0\ldots nnz-1]6), memory usage as a percentage of the uncompressed COO size was: CSC rowidx[0…nnz−1]\texttt{rowidx}[0\ldots nnz-1]7, VCSC rowidx[0…nnz−1]\texttt{rowidx}[0\ldots nnz-1]8, IVCSC rowidx[0…nnz−1]\texttt{rowidx}[0\ldots nnz-1]9.
  • When redundancy is low (e.g., unique floats per entry), VCSC and IVCSC offer no benefit and may increase memory usage (Ruiter et al., 2023).
Dataset colptr[0…n]\texttt{colptr}[0\ldots n]0 Sparsity MMR CSC (%) VCSC (%) IVCSC (%)
Single-cell colptr[0…n]\texttt{colptr}[0\ldots n]1 91.9% 0.987 75 26 9
MovieLens colptr[0…n]\texttt{colptr}[0\ldots n]2 97.0% 0.616 76 33 18
Simulated unique colptr[0…n]\texttt{colptr}[0\ldots n]3 90.0% 0.0 75 100 105

A plausible implication is that the choice of format should be matched to the data’s redundancy profile.

6. Performance Trade-Offs and Applicability

  • CSC/COO: Offer the fastest SpMV and SpMM, with colptr[0…n]\texttt{colptr}[0\ldots n]4 storage. Construction from COO is fastest when redundancy is low.
  • VCSC: Incurs colptr[0…n]\texttt{colptr}[0\ldots n]5 per-column looping overhead and offers its largest gains when colptr[0…n]\texttt{colptr}[0\ldots n]6. In benchmarked ML/genomics workloads, SpMV is within colptr[0…n]\texttt{colptr}[0\ldots n]7 of CSC's runtime.
  • IVCSC: Maximizes memory savings—up to colptr[0…n]\texttt{colptr}[0\ldots n]8 vs. COO, colptr[0…n]\texttt{colptr}[0\ldots n]9 vs. CSC—at the cost of val\texttt{val}0 slower SpMV (e.g., 420 ms for IVCSC vs. 140 ms for CSC in a val\texttt{val}1 matrix with 90% mean redundancy). Construction time remains tractable (val\texttt{val}2 ms on val\texttt{val}3 matrices at low redundancy).

Selection guidelines for practitioners:

  • Use CSC/COO for general sparse data with little redundancy.
  • Use VCSC when columns aggregate repeated values (count data, ratings, genomics); minor performance cost, significant memory reduction.
  • Use IVCSC where memory capacity is limiting and reduced I/O bandwidth or slower decompression can be tolerated (e.g., offline post-processing of very large matrices) (Ruiter et al., 2023).

7. Contextual Relevance and Limitations

CSC remains a preferred default for a broad spectrum of sparse linear algebra workloads due to low overhead and compatibility with major computing frameworks. However, its inability to exploit nonzero value redundancy has motivated the creation of VCSC and IVCSC, particularly for modern data modalities with structured repetition. The effectiveness of these extensions is empirically bounded by the actual redundancy and local row-index proximity in the data. Deployment of VCSC and IVCSC should explicitly consider their overheads in construction time, per-column uniqueness ratio, and compatibility with existing computational routines—factors that delimit their applicability for certain classes of problems (Ruiter et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Compressed Sparse Column (CSC) Format.