layker

🐟 Layker 🐟
Lakehouse‑Aligned YAML Kit for Engineering Rules

Declarative table metadata control for Databricks & Spark.
Layker turns a YAML spec into safe, validated DDL with a built‑in audit log. If nothing needs to change, Layker exits cleanly. If something must change, you’ll see it first.

Quick Navigation

- What is Layker? - Installation - Quickstart - How it works - Audit log model - Modes & parameters - Serverless & classic - Repository layout - Troubleshooting - Contributing & License

What is Layker?

Layker is a Python package for managing table DDL, metadata, and auditing with a single YAML file as the source of truth.

Highlights

Declarative – author schemas, tags, constraints, and properties in YAML.
Diff‑first – Layker computes a diff against the live table; “no diff” = no work.
Safe evolution – add/rename/drop column intents are detected and gated by required Delta properties.
Auditable – every applied change is logged with before/after snapshots and a concise differences dictionary.
Works in serverless or classic clusters – avoids unsupported operations automatically.

Installation

Stable:

pip install layker

Latest (main):

pip install "git+https://github.com/Levi-Gagne/layker.git"

Python 3.8+ and Spark 3.3+ are recommended. If you already have PySpark on the cluster, Layker will use it.

Quickstart

1) Author a YAML spec

Minimal example (save as src/layker/resources/example.yaml):

catalog: dq_dev
schema: lmg_sandbox
table: layker_test

columns:
  1:
    name: id
    datatype: bigint
    nullable: false
    active: true
  2:
    name: name
    datatype: string
    nullable: true
    active: true

table_comment: Demo table managed by Layker
table_properties:
  delta.columnMapping.mode: "name"
  delta.minReaderVersion: "2"
  delta.minWriterVersion: "5"

primary_key: [id]
tags:
  domain: demo
  owner: team-data

2) Sync from Python

from pyspark.sql import SparkSession
from layker.main import run_table_load

spark = SparkSession.builder.appName("layker").getOrCreate()

run_table_load(
    yaml_path="src/layker/resources/example.yaml",
    env="prd",
    dry_run=False,
    mode="all",                 # validate | diff | apply | all
    audit_log_table=True        # True=default audit YAML, False=disable, or str path to an audit YAML
)

3) Or via CLI

python -m layker src/layker/resources/example.yaml prd false all true

When audit_log_table=True, Layker uses the packaged default: layker/resources/layker_audit.yaml.
You can also pass a custom YAML path. Either way, the YAML defines the audit table’s location.

How it works (at a glance)

Validate YAML → fast fail with exact reasons, or proceed.
Snapshot live table (if it exists).
Compute differences between YAML snapshot and table snapshot.
- If no changes (i.e., the diff contains only full_table_name), exit with a success message and no audit row is written.
Validate differences (schema‑evolution preflight):
- Detects add/rename/drop column intents.
- Requires Delta properties for evolution:
  - delta.columnMapping.mode = name
  - delta.minReaderVersion = 2
  - delta.minWriterVersion = 5
- On missing requirements, prints details and exits.
Apply changes (create/alter) using generated SQL.
Audit (only if changes were applied and auditing is enabled):
- Writes a row containing:
  - before_value (JSON), differences (JSON), after_value (JSON)
  - change_category (create or update)
  - change_key (human‑readable sequence per table)
  - env, yaml_path, fqn, timestamps, actor, etc.

Audit log model

The default audit YAML (layker/resources/layker_audit.yaml) defines these columns (in order):

change_id – UUID per row
run_id – optional job/run identifier
env – environment/catalog prefix
yaml_path – the source YAML path that initiated the change
fqn – fully qualified table name
change_category – create or update (based on whether a “before” snapshot was present)
change_key – readable sequence per table:
- First ever create: create-1
- Subsequent updates on that lineage: create-1~update-1, create-1~update-2, …
- If the table is later dropped & re‑created: the next lineage becomes create-2, etc.
before_value – JSON snapshot before change (may be null on first create)
differences – JSON diff dict that was applied
after_value – JSON snapshot after change
notes – optional free text
created_at / created_by / updated_at / updated_by

Uniqueness expectation: (fqn, change_key) is effectively unique over time.

Modes & parameters

mode: validate diff apply all
- validate: only YAML validation (exits on success)
- diff: prints proposed changes and exits
- apply: applies changes only
- all: validate → diff → apply → audit
audit_log_table:
- False – disable auditing
- True – use default layker/resources/layker_audit.yaml
- str – path to a custom audit YAML (the YAML governs the destination table)
No‑op safety: if there are no changes, Layker exits early and skips audit.

Serverless & classic environments

Layker is compatible with Databricks Serverless and classic clusters. If an operation isn’t supported on serverless, Layker automatically avoids it and continues with the rest of the flow.

Repository layout

For the full tree, see docs/tree.txt.

Show condensed layout

``` layker/ ├── .github/ │ └── workflows/ │ └── workflow.yaml │ ├── archive/ │ ├── main.py │ ├── sanitizer.py │ ├── snapshot_yaml.py │ ├── steps_audit.py │ ├── steps_differences.py │ ├── steps_loader.py │ ├── validate.py │ ├── validators_evolution.py │ └── yaml.py │ ├── docs/ │ ├── audit.md │ ├── differences.txt │ ├── FAQ │ ├── FLOW │ ├── future_enhancements.txt │ ├── snapshot.txt │ └── tree.txt │ ├── src/ │ ├── layker/ │ │ ├── resources/ │ │ │ ├── config_driven_table_example.yaml │ │ │ ├── example.yaml │ │ │ ├── layker_audit.yaml │ │ │ └── layker_test.yaml │ │ │ │ │ ├── utils/ │ │ │ ├── __init__.py │ │ │ ├── color.py │ │ │ ├── dry_run.py │ │ │ ├── paths.py │ │ │ ├── printer.py │ │ │ ├── spark.py │ │ │ ├── table.py │ │ │ ├── timer.py │ │ │ └── yaml_table_dump.py │ │ │ │ │ ├── validators/ │ │ │ ├── __init__.py │ │ │ ├── differences.py │ │ │ └── params.py │ │ │ │ │ ├── __about__.py │ │ ├── __init__.py │ │ ├── __main__.py │ │ ├── differences.py │ │ ├── loader.py │ │ ├── logger.py │ │ ├── main.py │ │ ├── snapshot_table.py │ │ └── snapshot_yaml.py │ │ │ │ │ ├── dev_testing.ipynb │ └── test_layker.ipynb │ ├── tests/ │ ├── __init__.py │ ├── test_loader.py │ └── test_main.py │ ├── .gitignore ├── LICENSE ├── MANIFEST.in ├── pyproject.toml ├── README.md └── requirements.txt ```

Troubleshooting

Spark Connect / serverless: Layker avoids schema inference issues by using explicit schemas when writing the audit row.
Single quotes in comments: Layker sanitizes YAML comments to avoid SQL quoting errors.
No changes but I still see output: A diff containing only full_table_name means no change; Layker exits early with a success message and writes no audit row.

Contributing & License

PRs and issues welcome.
License: see LICENSE in the repo. </div>

Built for engineers, by engineers.
🐟 LAYKER 🐟

This site is open source. Improve this page.