layker

🐟 Layker 🐟
Lakehouse‑Aligned YAML Kit for Engineering Rules

Declarative table metadata control for Databricks & Spark.
Layker turns a YAML spec into safe, validated DDL with a built‑in audit log. If nothing needs to change, Layker exits cleanly. If something must change, you’ll see it first.


Quick Navigation - What is Layker? - Installation - Quickstart - How it works - Audit log model - Modes & parameters - Serverless & classic - Repository layout - Troubleshooting - Contributing & License

What is Layker?

Layker is a Python package for managing table DDL, metadata, and auditing with a single YAML file as the source of truth.

Highlights

Installation

Stable:

pip install layker

Latest (main):

pip install "git+https://github.com/Levi-Gagne/layker.git"

Python 3.8+ and Spark 3.3+ are recommended. If you already have PySpark on the cluster, Layker will use it.


Quickstart

1) Author a YAML spec

Minimal example (save as src/layker/resources/example.yaml):

catalog: dq_dev
schema: lmg_sandbox
table: layker_test

columns:
  1:
    name: id
    datatype: bigint
    nullable: false
    active: true
  2:
    name: name
    datatype: string
    nullable: true
    active: true

table_comment: Demo table managed by Layker
table_properties:
  delta.columnMapping.mode: "name"
  delta.minReaderVersion: "2"
  delta.minWriterVersion: "5"

primary_key: [id]
tags:
  domain: demo
  owner: team-data

2) Sync from Python

from pyspark.sql import SparkSession
from layker.main import run_table_load

spark = SparkSession.builder.appName("layker").getOrCreate()

run_table_load(
    yaml_path="src/layker/resources/example.yaml",
    env="prd",
    dry_run=False,
    mode="all",                 # validate | diff | apply | all
    audit_log_table=True        # True=default audit YAML, False=disable, or str path to an audit YAML
)

3) Or via CLI

python -m layker src/layker/resources/example.yaml prd false all true

When audit_log_table=True, Layker uses the packaged default: layker/resources/layker_audit.yaml.
You can also pass a custom YAML path. Either way, the YAML defines the audit table’s location.


How it works (at a glance)

  1. Validate YAML β†’ fast fail with exact reasons, or proceed.
  2. Snapshot live table (if it exists).
  3. Compute differences between YAML snapshot and table snapshot.
    • If no changes (i.e., the diff contains only full_table_name), exit with a success message and no audit row is written.
  4. Validate differences (schema‑evolution preflight):
    • Detects add/rename/drop column intents.
    • Requires Delta properties for evolution:
      • delta.columnMapping.mode = name
      • delta.minReaderVersion = 2
      • delta.minWriterVersion = 5
    • On missing requirements, prints details and exits.
  5. Apply changes (create/alter) using generated SQL.
  6. Audit (only if changes were applied and auditing is enabled):
    • Writes a row containing:
      • before_value (JSON), differences (JSON), after_value (JSON)
      • change_category (create or update)
      • change_key (human‑readable sequence per table)
      • env, yaml_path, fqn, timestamps, actor, etc.

Audit log model

The default audit YAML (layker/resources/layker_audit.yaml) defines these columns (in order):

Uniqueness expectation: (fqn, change_key) is effectively unique over time.


Modes & parameters


Serverless & classic environments

Layker is compatible with Databricks Serverless and classic clusters. If an operation isn’t supported on serverless, Layker automatically avoids it and continues with the rest of the flow.


Repository layout

For the full tree, see docs/tree.txt.

Show condensed layout ``` layker/ β”œβ”€β”€ .github/ β”‚ └── workflows/ β”‚ └── workflow.yaml β”‚ β”œβ”€β”€ archive/ β”‚ β”œβ”€β”€ main.py β”‚ β”œβ”€β”€ sanitizer.py β”‚ β”œβ”€β”€ snapshot_yaml.py β”‚ β”œβ”€β”€ steps_audit.py β”‚ β”œβ”€β”€ steps_differences.py β”‚ β”œβ”€β”€ steps_loader.py β”‚ β”œβ”€β”€ validate.py β”‚ β”œβ”€β”€ validators_evolution.py β”‚ └── yaml.py β”‚ β”œβ”€β”€ docs/ β”‚ β”œβ”€β”€ audit.md β”‚ β”œβ”€β”€ differences.txt β”‚ β”œβ”€β”€ FAQ β”‚ β”œβ”€β”€ FLOW β”‚ β”œβ”€β”€ future_enhancements.txt β”‚ β”œβ”€β”€ snapshot.txt β”‚ └── tree.txt β”‚ β”œβ”€β”€ src/ β”‚ β”œβ”€β”€ layker/ β”‚ β”‚ β”œβ”€β”€ resources/ β”‚ β”‚ β”‚ β”œβ”€β”€ config_driven_table_example.yaml β”‚ β”‚ β”‚ β”œβ”€β”€ example.yaml β”‚ β”‚ β”‚ β”œβ”€β”€ layker_audit.yaml β”‚ β”‚ β”‚ └── layker_test.yaml β”‚ β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ utils/ β”‚ β”‚ β”‚ β”œβ”€β”€ __init__.py β”‚ β”‚ β”‚ β”œβ”€β”€ color.py β”‚ β”‚ β”‚ β”œβ”€β”€ dry_run.py β”‚ β”‚ β”‚ β”œβ”€β”€ paths.py β”‚ β”‚ β”‚ β”œβ”€β”€ printer.py β”‚ β”‚ β”‚ β”œβ”€β”€ spark.py β”‚ β”‚ β”‚ β”œβ”€β”€ table.py β”‚ β”‚ β”‚ β”œβ”€β”€ timer.py β”‚ β”‚ β”‚ └── yaml_table_dump.py β”‚ β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ validators/ β”‚ β”‚ β”‚ β”œβ”€β”€ __init__.py β”‚ β”‚ β”‚ β”œβ”€β”€ differences.py β”‚ β”‚ β”‚ └── params.py β”‚ β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ __about__.py β”‚ β”‚ β”œβ”€β”€ __init__.py β”‚ β”‚ β”œβ”€β”€ __main__.py β”‚ β”‚ β”œβ”€β”€ differences.py β”‚ β”‚ β”œβ”€β”€ loader.py β”‚ β”‚ β”œβ”€β”€ logger.py β”‚ β”‚ β”œβ”€β”€ main.py β”‚ β”‚ β”œβ”€β”€ snapshot_table.py β”‚ β”‚ └── snapshot_yaml.py β”‚ β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ dev_testing.ipynb β”‚ └── test_layker.ipynb β”‚ β”œβ”€β”€ tests/ β”‚ β”œβ”€β”€ __init__.py β”‚ β”œβ”€β”€ test_loader.py β”‚ └── test_main.py β”‚ β”œβ”€β”€ .gitignore β”œβ”€β”€ LICENSE β”œβ”€β”€ MANIFEST.in β”œβ”€β”€ pyproject.toml β”œβ”€β”€ README.md └── requirements.txt ```

Troubleshooting


Contributing & License

PRs and issues welcome.
License: see LICENSE in the repo. </div>

Built for engineers, by engineers.
🐟 LAYKER πŸŸ