| name | CQL Type System & Schema Handling |
| description | Implement and deserialize all CQL types including primitives (int, text, timestamp, uuid, varint, decimal), collections (list, set, map), tuples, UDTs (user-defined types), and frozen types. Use when working with CQL type deserialization, schema validation, collection parsing, UDT handling, or type-correct data generation. |
| allowed-tools | Read, Grep, Glob |
CQL Type System & Schema Handling
This skill provides guidance on implementing Cassandra CQL type system with schema-provided deserialization.
When to Use This Skill
- Implementing CQL type deserializers
- Parsing collection types (list, set, map)
- Handling User-Defined Types (UDTs)
- Working with frozen vs non-frozen types
- Tuple deserialization
- Schema validation
- Type-correct data generation
Core Principles
Schema-Provided Deserialization
Per PRD: schema passed in, not inferred
// Schema provides type information
fn deserialize_cell(
data: &[u8],
column_type: &CqlType, // From schema
) -> Result<CqlValue>
Never try to infer type from data alone - always use schema.
CQL Type Categories
1. Primitive Types
Fixed-Size Primitives
boolean- 1 byte (0x00 or 0x01)tinyint- 1 byte signedsmallint- 2 bytes signed, big-endianint- 4 bytes signed, big-endianbigint- 8 bytes signed, big-endianfloat- 4 bytes IEEE 754double- 8 bytes IEEE 754date- 4 bytes (days since epoch)time- 8 bytes (nanoseconds since midnight)
Variable-Size Primitives
text/varchar- UTF-8 encoded stringblob- raw bytesascii- ASCII-only string
Special Primitives
uuid/timeuuid- 16 bytesinet- 4 bytes (IPv4) or 16 bytes (IPv6)varint- variable-length big integerdecimal- scale (4 bytes) + unscaled varintduration- months, days, nanoseconds (3 VInts)timestamp- 8 bytes (milliseconds since Unix epoch)
2. Collection Types
See collections-and-udts.md for detailed format.
Collection Format:
[4 bytes: element_count (big-endian)]
[for each element:]
[4 bytes: element_size (big-endian)]
[bytes: element_data]
Types:
list<T>- Ordered, allows duplicatesset<T>- Unordered, no duplicatesmap<K,V>- Key-value pairs
3. Tuple Types
Format:
[element_1_data]
[element_2_data]
...
No size prefix - elements serialized back-to-back. Each element uses its type's serialization.
4. User-Defined Types (UDTs)
Format:
[for each field in schema order:]
[4 bytes: field_size (-1 for null, 0 for empty, >0 for data)]
[if size > 0:]
[bytes: field_data]
UDT schema defines field names and types.
5. Frozen vs Non-Frozen
Frozen types:
- Serialized as single blob
- Cannot update individual elements
- Used in primary keys
- Nested collections must be frozen
Non-frozen collections:
- Can update individual elements
- Only allowed at top level (not nested)
- Uses tombstones for deletions
Type Deserialization Patterns
Zero-Copy Pattern
use bytes::Bytes;
fn deserialize_text(data: Bytes) -> Result<String> {
// Zero-copy: validate UTF-8 then wrap
let s = std::str::from_utf8(&data)?;
Ok(s.to_string()) // Only copy if needed
}
fn deserialize_blob(data: Bytes) -> Result<Bytes> {
// Zero-copy: just return the slice
Ok(data)
}
Length-Prefixed Pattern
fn deserialize_length_prefixed(data: &[u8]) -> Result<(Bytes, &[u8])> {
if data.len() < 4 {
return Err(Error::NotEnoughBytes);
}
let size = i32::from_be_bytes([data[0], data[1], data[2], data[3]]);
if size < 0 {
return Ok((Bytes::new(), &data[4..])); // Null
}
let size = size as usize;
if data.len() < 4 + size {
return Err(Error::NotEnoughBytes);
}
let value = Bytes::copy_from_slice(&data[4..4 + size]);
let remaining = &data[4 + size..];
Ok((value, remaining))
}
Collection Pattern
fn deserialize_list(
data: &[u8],
element_type: &CqlType,
) -> Result<Vec<CqlValue>> {
let count = i32::from_be_bytes([data[0], data[1], data[2], data[3]]) as usize;
let mut offset = 4;
let mut elements = Vec::with_capacity(count);
for _ in 0..count {
let (element_data, remaining) = deserialize_length_prefixed(&data[offset..])?;
let element = deserialize_value(&element_data, element_type)?;
elements.push(element);
offset = data.len() - remaining.len();
}
Ok(elements)
}
Schema Handling
Schema Sources
- Statistics.db: Serialization header with column definitions
- System tables:
system_schema.tables,system_schema.columns - CQL schema file: For test data generation
Schema Representation
struct TableSchema {
keyspace: String,
table: String,
partition_keys: Vec<ColumnDef>,
clustering_keys: Vec<ColumnDef>,
regular_columns: Vec<ColumnDef>,
static_columns: Vec<ColumnDef>,
}
struct ColumnDef {
name: String,
cql_type: CqlType,
}
enum CqlType {
// Primitives
Boolean,
Int,
BigInt,
Text,
Uuid,
Timestamp,
// ... more primitives
// Collections
List(Box<CqlType>),
Set(Box<CqlType>),
Map(Box<CqlType>, Box<CqlType>),
// Complex
Tuple(Vec<CqlType>),
Udt(UdtDef),
// Modifiers
Frozen(Box<CqlType>),
}
PRD Alignment
Supports Milestone M1 (Core Reading Library):
- All CQL types including collections & UDTs
- Schema-provided deserialization (not inferred)
- Zero-copy patterns where possible
Supports Milestone M5 (Write Support):
- Type-correct serialization
- Schema validation
Common Pitfalls
1. Inferring Types
❌ Wrong: Look at data to guess type ✅ Right: Use schema to know type
2. Copying Unnecessarily
❌ Wrong: Vec<u8> for every field
✅ Right: Bytes with zero-copy slicing
3. Ignoring Null Handling
❌ Wrong: Assume all fields present ✅ Right: Check for null (-1 size prefix)
4. Frozen Semantics
❌ Wrong: Try to update frozen collection elements ✅ Right: Replace entire frozen value
5. Nested Collections
❌ Wrong: Allow non-frozen nested collections ✅ Right: Nested collections must be frozen
Type System References
Detailed specifications in:
- cql-types-reference.md - Complete type catalog
- collections-and-udts.md - Collection and UDT formats
Testing
Generate type-correct test data:
# Use test-data-management skill for Docker-based generation
cd test-data
./scripts/start-clean.sh
./scripts/generate.sh
Validate parsing against sstabledump:
sstabledump test-data/datasets/sstables/keyspace/table/*.db
Next Steps
When adding new type support:
- Add to
CqlTypeenum - Implement deserializer with zero-copy where possible
- Add serializer (for M5 write support)
- Create property tests with edge cases
- Generate test data with type
- Validate against sstabledump