Skip to content

xtellect/fc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fc — Floating-Point Compressor

Copyright (c) 2026 Praveen Vaddadi thynktank@gmail.com Licensed under the Apache License, Version 2.0. See LICENSE and NOTICE.

fc is a lossless compressor for streams of IEEE-754 64-bit doubles. It splits the input into adaptively-sized blocks (quanta), runs a competition between many specialized codecs on each block, and emits the smallest result. Compression and decompression are multi-threaded with POSIX threads and the hot paths are hand-vectorized for x86-64 with AVX2 + SSE4.2 + BMI + LZCNT.

The current version string is fc 1.56 (see fc_ver in fc.c).

Representative numbers

Aggregate result of the bundled test_fc harness across 17 synthetic datasets, 1 Mi doubles each (8 MiB per dataset, ≈136 MiB / 142.6 MB decimal total), median-of-5 trials per dataset, 8 encoder threads, decoder threads auto-selected, on the maintainer's x86-64 + AVX2 box. The "Ratio" column is total_original / total_compressed; the throughput columns are the harmonic mean of per-dataset MB/s. Numbers come straight out of test_fc.csv and will move on different hardware — re-run make test to regenerate.

Codec Ratio Encode (MB/s) Decode (MB/s)
fc 3.07 120 1277
zstd -3 2.07 528 1556
zstd -9 2.09 111 1572
fpzip 1.71 123 121
lz4hc -9 1.69 51 5835
gorilla 1.66 683 971
lz4 1.62 2176 5353

Headline: fc is the best size-oriented floating-point compressor in this benchmark, not the best all-around compressor. It wins ratio on 10 of 17 datasets outright and is never worse than roughly third on any of them.

Where fc wins (often by a lot):

  • Structured / analytic floats — constant ≈ 39,756× (vs. zstd-9 ≈ 11,619×), parabolic ≈ 2,973× (vs. 2.5×), int×1000 ≈ 5,388× (vs. 4.3×), piecewise ≈ 26.8× (vs. 14.9×), stocks ≈ 15.6× (vs. 7.1×).
  • Periodic / quasi-periodic signals — sin-low-freq, audio-mix, ar2-damped beat gorilla and fpzip by 1.3×–2.7×.
  • Decode is fast and parallel (auto-threaded inside fc_dec): aggregate ≈ 1.28 GB/s, ~10× faster than encode, ~10× faster than fpzip, ~1.3× faster than gorilla, and ~80% of zstd-3 decode. Good fit for write-once / read-many time-series stores.

Where it loses:

  • Byte-pattern-friendly quantized data — zstd wins on decimal-cents (zstd-9 ≈ 3,465× vs fc 268×), dict-16 (zstd-9 ≈ 10,268× vs fc 4,660×), and quantized-4lvl (zstd-9 ≈ 9,675× vs fc 1,037×). General-purpose LZ catches structure fc doesn't currently model.
  • Noisy natural-float arrays — fpzip is marginally better on random-walk (1.39 vs 1.38), climate (1.395 vs 1.380), geo-coords (2.012 vs 2.000), and sensor-noisy (1.397 vs 1.386). Differences are small but consistent.
  • Encode throughput — ~120 MB/s aggregate. The mode competition is what buys the ratio; you pay for it in encoder CPU. If encode is the bottleneck, lz4 (~2.2 GB/s) or zstd-3 (~530 MB/s) dominate.
  • Pseudo-random data — ratio ≈ 1.000. Same as every other lossless codec; fc just doesn't waste headers.

Hard constraints: lossless only (no quantization). The encoder takes 8-byte-aligned input in multiples of 8 bytes; the heuristics and most modes are tuned for IEEE-754 double-precision bit patterns, so other 8-byte payloads will compress but you may leave ratio on the table. Requires x86-64 with AVX2 + SSE4.2 + BMI + LZCNT.

Rule of thumb:

  • Want the smallest file for a floating-point stream and decode-side latency matters → fc.
  • Encode-bound on heterogeneous data → zstd or lz4.
  • Noisy scientific arrays where every byte counts → benchmark fc vs fpzip on your data; they're within ~1% of each other.

Status

This is a research-grade single-file library. The on-disk format is versioned by a magic number in the stream header; backwards-incompatible format changes will bump the magic. Treat compressed output as opaque — do not rely on internal layout.

Repository layout

File Purpose
fc.h Public API (encode, decode, monitoring counters).
fc.c Compressor implementation. ~50 codecs + dispatch + threads.
gorilla.h Bundled Redis "Gorilla" XOR/delta codec — third-party.
gorilla.c Bundled Redis "Gorilla" XOR/delta codec — third-party.
test_fc.c Round-trip + benchmark harness over 17 synthetic datasets.
Makefile Builds fc.o, gorilla.o, and the test_fc benchmark.
LICENSE Apache License 2.0 (this project's own code).
NOTICE Attribution and third-party license disclosures.
CONTRIBUTING.md Contribution guide and DCO.

Public API

The whole API is in fc.h:

typedef struct { int p, t, c; } fc_cfg;

size_t fc_enc(const void *src, size_t bytes, void *dst, fc_cfg cfg);
size_t fc_dec(const void *src, size_t bytes, void *dst);

extern const char  fc_ver[];
extern unsigned long fc_dec_mode_hist[64];
extern unsigned long fc_enc_mode_time_ns[64];
extern unsigned long fc_enc_mode_calls[64];
extern unsigned long fc_enc_mode_wins[64];

void        fc_dec_mode_hist_reset(void);
const char *fc_mode_name(int mode);

fc_cfg fields

Field Meaning
p log2 of the predictor table size. Clamped to [10, 16] inside fc_enc. The test_fc benchmark passes 18, which is silently clamped to 16.
t Worker thread count for the encoder. The decoder takes no fc_cfg and auto-selects threads via sysconf(_SC_NPROCESSORS_ONLN), capped at 128 and at the per-stream block count.
c Reserved. Currently unused ((void)cfg.c; in fc_enc).

fc_enc(const void *src, size_t bytes, void *dst, fc_cfg cfg)

Compresses bytes of input (a multiple of 8 — one double per 8 bytes). Returns the number of compressed bytes written, or 0 on failure: NULL src / dst, bytes not a multiple of 8, or an internal allocation failure.

The destination buffer must be sized by the caller. The library does not publish a closed-form comp_bound. The bundled benchmark allocates 2 * bytes + 64 KiB (test_fc.c), which has been sufficient for every dataset in the test suite; for arbitrary inputs, allocate at least that much and check the return value (the encoder will return 0 rather than overflow).

fc_dec(const void *src, size_t bytes, void *dst)

Decompresses a stream produced by fc_enc. Returns the number of decompressed bytes written, or 0 on failure. Failure conditions: NULL src / dst, truncated header, wrong magic, header predsizelg2 outside [10, 16], a per-block header that claims more input or output than remains, total decoded size that doesn't match the header's original_bytes, or an internal allocation failure. The caller is responsible for sizing dst from the original byte count recorded in the stream header (use fc_dec once with a header read, or simply allocate the original buffer length you compressed).

Note: unknown / unimplemented mode IDs are not currently treated as a hard failure — the affected block is silently left as zeros. Treat streams from untrusted sources accordingly.

Monitoring counters

The fc_*_mode_* arrays are 64-entry tables indexed by mode ID. They are updated atomically and are intended for diagnostics only:

  • fc_dec_mode_hist[m] — number of blocks decoded with mode m.
  • fc_enc_mode_calls[m] — number of times mode m was evaluated during the encoder competition.
  • fc_enc_mode_wins[m] — number of times mode m won and was written.
  • fc_enc_mode_time_ns[m] — cumulative wall-clock nanoseconds spent evaluating mode m.

fc_mode_name(m) returns a short string for mode m (or "?" for unused IDs); see the table below.

Modes

fc currently defines 50 mode IDs (0–49 used; 12, 14, and 50–63 reserved). Mode names as exposed by fc_mode_name:

 0 PRED                    25 PRED2
 1 CONST                   26 PRED_ADAPTIVE
 2 STRIDE                  27 VITERBI
 3 XORZ                    28 DELTA_BINNED
 4 LZ                      29 PRED_RC
 5 RAW                     30 PRED_INTERLEAVED
 6 FLOAT32                 31 BWT
 7 ORDERED_DELTA           32 LZ_DICT
 8 FUZZY_STRIDE            33 CONV_N
 9 ALP                     34 SIGN_CONV
10 TRAILING_ZERO_BP        35 CONV_DOUBLE
11 BYTE_TRANSPOSE          36 MTF_LZ
13 XOR128                  37 CONV_DOUBLE_BP
15 LSB_STRIP               38 CONV_N_BINNED
16 LOOKBACK_DELTA          39 PRED_SIMD_INTERLEAVED
17 FLOAT_MULT              40 FUZZY_STRIDE_ANS
18 FCM_RLE                 41 PAQ_MIXER
19 DICT                    42 PAQ4_MIXER
20 DELTA2                  43 BWT_MTF_TANS
21 BITPLANE                44 PRED4
22 INT_MULT                45 DELTA_DP_BINNED
23 CONV1                   46 CONV_N_DP_BINNED
24 PRED_TANS               47 ELF
                           48 LZ_SPLIT
                           49 BWT_MTF_RC

Roughly grouped:

  • Predictors: PRED, PRED2, PRED4, PRED_TANS, PRED_RC, PRED_ADAPTIVE, PRED_INTERLEAVED, PRED_SIMD_INTERLEAVED, VITERBI, LSB_STRIP — FCM/DFCM-style hash predictors with various residual coders (raw, tANS, range coding) and SIMD layouts.
  • XOR / delta: XORZ, XOR128, LOOKBACK_DELTA, ORDERED_DELTA, DELTA2, DELTA_BINNED, DELTA_DP_BINNED.
  • Constant / stride / dictionary: CONST, STRIDE, FUZZY_STRIDE, FUZZY_STRIDE_ANS, DICT, LZ_DICT, MTF_LZ.
  • Lempel-Ziv: LZ, LZ_SPLIT.
  • Floating-point specific: FLOAT32, FLOAT_MULT, INT_MULT, ALP (Adaptive Lossless Floating-point), ELF (Erasing Last bits).
  • Transforms: BYTE_TRANSPOSE, BITPLANE, TRAILING_ZERO_BP, SIGN_CONV, BWT, BWT_MTF_TANS, BWT_MTF_RC.
  • Convolutional / linear models: CONV1, CONV_N, CONV_DOUBLE, CONV_DOUBLE_BP, CONV_N_BINNED, CONV_N_DP_BINNED.
  • Mixers: PAQ_MIXER, PAQ4_MIXER, FCM_RLE.
  • Fallback: RAW.

fc.c is the source of truth for what each mode actually does.

How it works

  1. Header. The stream begins with a fixed-size header containing a magic number, the original byte count, the quantum size used, and the clamped predictor log2.
  2. Block planning. Input is scanned and split into blocks. The default quantum is 256 KiB (C_QUANTUM_BYTES) but a probe (ceq_probe_chunk_values) can grow a block up to 1 MiB (C_QUANTUM_MAX_BYTES) when the data looks low-entropy.
  3. Mode competition. Workers pull blocks from a queue. For each block, the encoder evaluates a feature-gated subset of the modes (exp_range, sign_flips, distinct_count, looks_like_repeats, and the running best decide which modes are worth trying) and keeps the smallest output. Per-mode wall time, call count, and win count are recorded in the fc_enc_mode_* counters.
  4. Emission. Each block is written as [1-byte mode][payload], with length and offset bookkeeping in the stream so fc_dec can find block boundaries without scanning the payload.
  5. Decode. fc_dec walks the block index and dispatches each block's payload to the matching codec. Like encode, decode is multi-threaded; fc_dec itself picks the worker count automatically (online CPUs, capped at 128 and at the block count).

Building

make            # builds fc.o, gorilla.o, and the test_fc benchmark
make test       # runs ./test_fc
make clean

Toolchain

  • A C11 compiler (clang or gcc). clang is the Makefile default.
  • POSIX threads.
  • pthread and libm (linked by default).

Required CPU features

The default CFLAGS enable -mavx2 -msse4.2 -mbmi -mlzcnt. The library uses these intrinsics directly; running the binary on a CPU without them will fault with SIGILL. There is no portable fallback.

Optional benchmark dependencies

Auto-detected by the Makefile and only used by test_fc.c for side-by-side comparisons — they are not needed to build or use the library itself:

  • libzstd (via pkg-config libzstd)
  • liblz4 (via pkg-config liblz4)
  • fpzip (header probe at /usr/include/fpzip.h, links -lfpzip)

Benchmark harness

test_fc runs each of 17 synthetic generators (constant, linear, parabolic, AR(2)-damped, piecewise, integer-multiples, decimal-cents, dict-16, sin-low-freq, audio-mix, random-walk, quantized-4lvl, climate, geo-coords, stocks, sensor-noisy, pseudo-random) at 1 Mi doubles per dataset (overridable via FCBENCH_N), 5 trials timed at median, 8 encoder threads (compile-time THREADS), and writes test_fc.csv with compression ratio and encode/decode throughput against the optional baselines (zstd, lz4, fpzip, gorilla) that were detected at build time.

Round-trip correctness is checked on every dataset. A MISMATCH line in the output is a regression.

Limitations

  • x86-64 only; no ARM/NEON path. AVX2 + SSE4.2 + BMI + LZCNT required at run time.
  • fc_dec thread count is auto-selected and not user-configurable (the decode API takes no fc_cfg).
  • cfg.c is reserved and currently a no-op ((void)cfg.c;).
  • The encoder requires bytes to be a multiple of 8 (one full double).
  • The predictor parameter cfg.p is silently clamped to [10, 16].
  • Unknown mode IDs in a compressed stream are not flagged as errors; affected blocks decode to zeros.
  • The on-disk format is not stable across major versions.

License

This project's own code (fc.c, fc.h, test_fc.c, Makefile, documentation, and build infrastructure) is licensed under the Apache License, Version 2.0. See LICENSE.

The bundled files gorilla.c and gorilla.h are not Apache 2.0. They are © Redis Ltd. and remain under their original triple license: RSALv2, SSPLv1, or AGPLv3 (your choice). See the file headers and NOTICE for the full text and attribution. Downstream users who cannot accept any of those three licenses should rebuild without gorilla.o (the core fc library does not link against it; only the test_fc benchmark does).

Contributing

See CONTRIBUTING.md. Contributions to this project's own files are accepted under Apache 2.0; please sign off your commits (git commit -s).

Contact

  • Maintainer: Praveen Vaddadi — thynktank@gmail.com
  • Security reports: same address; please do not file public issues for vulnerabilities.

Releases

No releases published

Packages

 
 
 

Contributors