API Reference

This page provides detailed documentation for the Quantize API.

quantize Module

Quantize - A simple Python library for quantizing floating point values to int4 values.

quantize.dequantize_from_int4(quantized_values: ndarray, scale: float, zero_point: float = 0) ndarray[source]

Dequantize int4 values back to floating point.

Parameters:
  • quantized_values – Array of quantized int4 values (stored as int8)

  • scale – Scaling factor used during quantization

  • zero_point – Zero point offset (usually 0 for symmetric quantization)

Returns:

Array of dequantized floating point values

quantize.quantize_to_int4(values: List[float] | ndarray, scale_method: str = 'minmax') Tuple[ndarray, float, float][source]

Quantize floating point values to int4 values (4-bit integers).

Int4 values range from -8 to 7 (16 distinct values).

Parameters:
  • values – List or array of floating point values to quantize

  • scale_method – Method to determine scaling factor (‘minmax’ or ‘absmax’)

Returns:

Tuple of (quantized_values, scale, zero_point) - quantized_values: numpy array of int4 values (stored as int8) - scale: scaling factor used for quantization - zero_point: zero point offset (usually 0 for symmetric quantization)

quantize.quantize Module

Implementation of quantization functions for converting between floating point and int4 values.

quantize.quantize.dequantize_from_int4(quantized_values: ndarray, scale: float, zero_point: float = 0) ndarray[source]

Dequantize int4 values back to floating point.

Parameters:
  • quantized_values – Array of quantized int4 values (stored as int8)

  • scale – Scaling factor used during quantization

  • zero_point – Zero point offset (usually 0 for symmetric quantization)

Returns:

Array of dequantized floating point values

quantize.quantize.example()[source]

Example demonstrating the quantization process.

quantize.quantize.pack_int4_to_int8(int4_values: ndarray) ndarray[source]

Pack two int4 values into each int8 value to save memory.

Parameters:

int4_values – Array of int4 values (stored as int8)

Returns:

Array of packed int8 values (half the length of input)

quantize.quantize.quantize_to_int4(values: List[float] | ndarray, scale_method: str = 'minmax') Tuple[ndarray, float, float][source]

Quantize floating point values to int4 values (4-bit integers).

Int4 values range from -8 to 7 (16 distinct values).

Parameters:
  • values – List or array of floating point values to quantize

  • scale_method – Method to determine scaling factor (‘minmax’ or ‘absmax’)

Returns:

Tuple of (quantized_values, scale, zero_point) - quantized_values: numpy array of int4 values (stored as int8) - scale: scaling factor used for quantization - zero_point: zero point offset (usually 0 for symmetric quantization)

quantize.quantize.unpack_int8_to_int4(packed_values: ndarray) ndarray[source]

Unpack int8 values back into int4 values.

Parameters:

packed_values – Array of packed int8 values

Returns:

Array of unpacked int4 values (twice the length of input)

Core Functions

quantize_to_int4

quantize.quantize_to_int4(values: List[float] | ndarray, scale_method: str = 'minmax') Tuple[ndarray, float, float][source]

Quantize floating point values to int4 values (4-bit integers).

Int4 values range from -8 to 7 (16 distinct values).

Parameters:
  • values – List or array of floating point values to quantize

  • scale_method – Method to determine scaling factor (‘minmax’ or ‘absmax’)

Returns:

Tuple of (quantized_values, scale, zero_point) - quantized_values: numpy array of int4 values (stored as int8) - scale: scaling factor used for quantization - zero_point: zero point offset (usually 0 for symmetric quantization)

The quantize_to_int4 function converts floating point values to int4 values (4-bit integers). Int4 values range from -8 to 7, providing 16 distinct values.

Parameters:
  • values (Union[List[float], np.ndarray]): List or array of floating point values to quantize

  • scale_method (str, optional): Method to determine scaling factor. Default is “minmax”. Options are:
    • “minmax”: Maps the min and max values to the int4 range

    • “absmax”: Maps the absolute max value to the int4 range

Returns:
  • Tuple[np.ndarray, float, float]: A tuple containing:
    • quantized_values: numpy array of int4 values (stored as int8)

    • scale: scaling factor used for quantization

    • zero_point: zero point offset (usually 0 for symmetric quantization)

dequantize_from_int4

quantize.dequantize_from_int4(quantized_values: ndarray, scale: float, zero_point: float = 0) ndarray[source]

Dequantize int4 values back to floating point.

Parameters:
  • quantized_values – Array of quantized int4 values (stored as int8)

  • scale – Scaling factor used during quantization

  • zero_point – Zero point offset (usually 0 for symmetric quantization)

Returns:

Array of dequantized floating point values

The dequantize_from_int4 function converts int4 values back to floating point.

Parameters:
  • quantized_values (np.ndarray): Array of quantized int4 values (stored as int8)

  • scale (float): Scaling factor used during quantization

  • zero_point (float, optional): Zero point offset. Default is 0.

Returns:
  • np.ndarray: Array of dequantized floating point values

Memory Optimization Functions

pack_int4_to_int8

quantize.quantize.pack_int4_to_int8(int4_values: ndarray) ndarray[source]

Pack two int4 values into each int8 value to save memory.

Parameters:

int4_values – Array of int4 values (stored as int8)

Returns:

Array of packed int8 values (half the length of input)

The pack_int4_to_int8 function packs two int4 values into each int8 value to save memory.

Parameters:
  • int4_values (np.ndarray): Array of int4 values (stored as int8)

Returns:
  • np.ndarray: Array of packed int8 values (half the length of input)

unpack_int8_to_int4

quantize.quantize.unpack_int8_to_int4(packed_values: ndarray) ndarray[source]

Unpack int8 values back into int4 values.

Parameters:

packed_values – Array of packed int8 values

Returns:

Array of unpacked int4 values (twice the length of input)

The unpack_int8_to_int4 function unpacks int8 values back into int4 values.

Parameters:
  • packed_values (np.ndarray): Array of packed int8 values

Returns:
  • np.ndarray: Array of unpacked int4 values (twice the length of input)

Example Function

example

quantize.quantize.example()[source]

Example demonstrating the quantization process.

The example function demonstrates the quantization process with sample values.

Returns:
  • Tuple[np.ndarray, np.ndarray, np.ndarray]: A tuple containing:
    • Original floating point values

    • Quantized int4 values

    • Dequantized floating point values