Canonical Enforcement in Sigil
Sigil does not treat canonical form as optional style. The compiler toolchain rejects non-canonical source.
Current Enforcement Model
Canonicality is now printer-first. The compiler parses source, builds an AST, prints the canonical source for that AST internally, and then compares the original file byte-for-byte against that printed form.
If the bytes differ:
sigil compilefailssigil runfailssigil testfails
There is no public formatter command. Sigil does not permit “almost canonical” source to run and then normalize later.
Canonical enforcement now happens like this:
Source
=> Tokenize
=> Parse
=> Canonical source print
=> Source == canonical print ?
=> Type check
=> Typed canonical validation
=> Codegen / Run / Test
Lexer-Level Rejections
The lexer rejects some non-canonical source directly:
- tab characters
- standalone
\r
Parse-Time / Surface Constraints
The parser enforces current surface forms such as:
- no
exporttoken - typed parameters
- required return types
- required
=before non-matchbodies - forbidden
=beforematchbodies
Canonical Validator
The validator still enforces canonical rules that are not reducible to printing alone, such as:
- filename rules
- declaration ordering
- file-purpose rules
- test location rules
- project-defined type declarations only in
src/types.lib.sigil src/types.lib.sigilbeing types-onlysrc/types.lib.sigilusing only§...and¶...inside type definitions and constraints- no dead extern declarations in executable
.sigilfiles - no dead top-level declarations in executable
.sigilfiles - no-shadowing
- record field ordering
- exact top-level wrappers around canonical
§...helpers and directmap/filter/reduce ... from ...surfaces - exact recursive list-plumbing bans where Sigil already has a canonical surface
- typed canonical restrictions like dead-binding rejection and single-use pure binding inlining
Typed Canonical Validation
After type checking, the validator enforces typed canonical rules.
Current important examples:
- named local bindings used zero times are rejected
- pure single-use local bindings must be inlined
- obvious literal contradictions against constrained types are rejected
Executable note:
.sigilfiles must keep top-level helper functions, consts, and types reachable frommainor tests.lib.sigilfiles are still allowed to expose public API that is unused locally
Current list-processing examples:
- exact wrappers like
λsum1(xs)=>Int=§list.sum(xs)are rejected in favor of§list.sum(xs)directly - exact wrappers like
λproject(fn,xs)=>[U]=xs map fnare rejected in favor ofxs map fndirectly - recursive
allclones are rejected in favor of§list.all - recursive
anyclones are rejected in favor of§list.any #(xs filter pred)is rejected in favor of§list.countIf- recursive
mapclones are rejected in favor ofmap - recursive
filterclones are rejected in favor offilter - recursive
findclones are rejected in favor of§list.find - recursive
flatMapclones are rejected in favor of§list.flatMap - recursive
foldclones are rejected in favor ofreduce ... from .../§list.fold - recursive
reverseclones are rejected in favor of§list.reverse - recursive result-building of the form
self(rest)⧺rhsis rejected
Why This Matters
Traditional ecosystems often rely on:
- style guides
- optional formatter passes
- lints that can be ignored
Sigil instead makes canonicality part of the accepted language surface.
That gives:
- one accepted spelling for common constructs
- better machine generation loops
- one textual representation per valid program
Practical Rule
If a doc claims “preferred style” but the compiler accepts multiple parseable forms, that claim is not yet canonical enforcement.
For Sigil, canonicality means the toolchain actually rejects the alternative.
Current high-signal printer choices:
λfib(n:Int)=>Int match n{...}is canonical; splitting the signature/body introducer is not- multi-arm
matchprints multiline - branching and non-trivial structure print multiline earlier than dense inline forms
- newline-containing string values print as multiline
"literals, not\n-escaped one-line strings - spacing is a consequence of the printer, not a second style system