π
Layker
π
LakehouseβAligned YAML Kit for Engineering Rules
Declarative table metadata control for Databricks & Spark.
Layker turns a YAML spec into safe, validated DDL with a builtβin audit log.
If nothing needs to change, Layker exits cleanly. If something must change, youβll see it first.
Quick Navigation
- What is Layker?
- Installation
- Quickstart
- How it works
- Audit log model
- Modes & parameters
- Serverless & classic
- Repository layout
- Troubleshooting
- Contributing & License
What is Layker?
Layker is a Python package for managing table DDL, metadata, and auditing with a single YAML file as the source of truth.
Highlights
- Declarative β author schemas, tags, constraints, and properties in YAML.
- Diffβfirst β Layker computes a diff against the live table; βno diffβ = no work.
- Safe evolution β add/rename/drop column intents are detected and gated by required Delta properties.
- Auditable β every applied change is logged with before/after snapshots and a concise differences dictionary.
- Works in serverless or classic clusters β avoids unsupported operations automatically.
Installation
Stable:
Latest (main):
pip install "git+https://github.com/Levi-Gagne/layker.git"
Python 3.8+ and Spark 3.3+ are recommended. If you already have PySpark on the cluster, Layker will use it.
Quickstart
1) Author a YAML spec
Minimal example (save as src/layker/resources/example.yaml
):
catalog: dq_dev
schema: lmg_sandbox
table: layker_test
columns:
1:
name: id
datatype: bigint
nullable: false
active: true
2:
name: name
datatype: string
nullable: true
active: true
table_comment: Demo table managed by Layker
table_properties:
delta.columnMapping.mode: "name"
delta.minReaderVersion: "2"
delta.minWriterVersion: "5"
primary_key: [id]
tags:
domain: demo
owner: team-data
2) Sync from Python
from pyspark.sql import SparkSession
from layker.main import run_table_load
spark = SparkSession.builder.appName("layker").getOrCreate()
run_table_load(
yaml_path="src/layker/resources/example.yaml",
env="prd",
dry_run=False,
mode="all", # validate | diff | apply | all
audit_log_table=True # True=default audit YAML, False=disable, or str path to an audit YAML
)
3) Or via CLI
python -m layker src/layker/resources/example.yaml prd false all true
When audit_log_table=True
, Layker uses the packaged default:
layker/resources/layker_audit.yaml
.
You can also pass a custom YAML path. Either way, the YAML defines the audit tableβs location.
How it works (at a glance)
- Validate YAML β fast fail with exact reasons, or proceed.
- Snapshot live table (if it exists).
- Compute differences between YAML snapshot and table snapshot.
- If no changes (i.e., the diff contains only
full_table_name
), exit with a success message and no audit row is written.
- Validate differences (schemaβevolution preflight):
- Detects add/rename/drop column intents.
- Requires Delta properties for evolution:
delta.columnMapping.mode = name
delta.minReaderVersion = 2
delta.minWriterVersion = 5
- On missing requirements, prints details and exits.
- Apply changes (create/alter) using generated SQL.
- Audit (only if changes were applied and auditing is enabled):
- Writes a row containing:
before_value
(JSON), differences
(JSON), after_value
(JSON)
change_category
(create
or update
)
change_key
(humanβreadable sequence per table)
env
, yaml_path
, fqn
, timestamps, actor, etc.
Audit log model
The default audit YAML (layker/resources/layker_audit.yaml
) defines these columns (in order):
- change_id β UUID per row
- run_id β optional job/run identifier
- env β environment/catalog prefix
- yaml_path β the source YAML path that initiated the change
- fqn β fully qualified table name
- change_category β
create
or update
(based on whether a βbeforeβ snapshot was present)
- change_key β readable sequence per table:
- First ever create:
create-1
- Subsequent updates on that lineage:
create-1~update-1
, create-1~update-2
, β¦
- If the table is later dropped & reβcreated: the next lineage becomes
create-2
, etc.
- before_value β JSON snapshot before change (may be null on first create)
- differences β JSON diff dict that was applied
- after_value β JSON snapshot after change
- notes β optional free text
- created_at / created_by / updated_at / updated_by
Uniqueness expectation: (fqn, change_key)
is effectively unique over time.
Modes & parameters
-
mode: validate |
diff |
apply |
all |
validate
: only YAML validation (exits on success)
diff
: prints proposed changes and exits
apply
: applies changes only
all
: validate β diff β apply β audit
- audit_log_table:
False
β disable auditing
True
β use default layker/resources/layker_audit.yaml
str
β path to a custom audit YAML (the YAML governs the destination table)
- Noβop safety: if there are no changes, Layker exits early and skips audit.
Serverless & classic environments
Layker is compatible with Databricks Serverless and classic clusters. If an operation isnβt supported on serverless, Layker automatically avoids it and continues with the rest of the flow.
Repository layout
For the full tree, see docs/tree.txt.
Show condensed layout
```
layker/
βββ .github/
β βββ workflows/
β βββ workflow.yaml
β
βββ archive/
β βββ main.py
β βββ sanitizer.py
β βββ snapshot_yaml.py
β βββ steps_audit.py
β βββ steps_differences.py
β βββ steps_loader.py
β βββ validate.py
β βββ validators_evolution.py
β βββ yaml.py
β
βββ docs/
β βββ audit.md
β βββ differences.txt
β βββ FAQ
β βββ FLOW
β βββ future_enhancements.txt
β βββ snapshot.txt
β βββ tree.txt
β
βββ src/
β βββ layker/
β β βββ resources/
β β β βββ config_driven_table_example.yaml
β β β βββ example.yaml
β β β βββ layker_audit.yaml
β β β βββ layker_test.yaml
β β β
β β βββ utils/
β β β βββ __init__.py
β β β βββ color.py
β β β βββ dry_run.py
β β β βββ paths.py
β β β βββ printer.py
β β β βββ spark.py
β β β βββ table.py
β β β βββ timer.py
β β β βββ yaml_table_dump.py
β β β
β β βββ validators/
β β β βββ __init__.py
β β β βββ differences.py
β β β βββ params.py
β β β
β β βββ __about__.py
β β βββ __init__.py
β β βββ __main__.py
β β βββ differences.py
β β βββ loader.py
β β βββ logger.py
β β βββ main.py
β β βββ snapshot_table.py
β β βββ snapshot_yaml.py
β β
β β
β βββ dev_testing.ipynb
β βββ test_layker.ipynb
β
βββ tests/
β βββ __init__.py
β βββ test_loader.py
β βββ test_main.py
β
βββ .gitignore
βββ LICENSE
βββ MANIFEST.in
βββ pyproject.toml
βββ README.md
βββ requirements.txt
```
Troubleshooting
- Spark Connect / serverless: Layker avoids schema inference issues by using explicit schemas when writing the audit row.
- Single quotes in comments: Layker sanitizes YAML comments to avoid SQL quoting errors.
- No changes but I still see output: A diff containing only
full_table_name
means no change; Layker exits early with a success message and writes no audit row.
Contributing & License
PRs and issues welcome.
License: see LICENSE in the repo.
</div>
Built for engineers, by engineers.
π LAYKER π