Canonical Forms in Sigil
Sigil enforces canonical forms so one valid program has one accepted surface.
This document records the current canonical rules enforced by the lexer, parser, validator, and typechecker in this repository.
Why Canonical Forms Exist
Canonical forms are not style guidance. They are part of the language contract.
Goals:
- remove alternative spellings for the same construct
- improve deterministic code generation
- make diagnostics corrective instead of advisory
- keep examples, tests, and generated code aligned
File Purpose
Sigil uses file extensions to distinguish file purpose:
.lib.sigilfor libraries.sigilfor executables and tests
Current canonical rules include:
.lib.sigilfiles must not definemain- non-test
.sigilfiles must definemain testdeclarations are only allowed undertests/
Filename Rules
Basenames must be lowerCamelCase.
Valid:
hello.sigiluserService.lib.sigilexample01Introduction.sigil
Invalid:
UserService.sigiluser_service.lib.sigiluser-service.sigil1intro.sigil
Current filename diagnostics:
SIGIL-CANON-FILENAME-CASESIGIL-CANON-FILENAME-INVALID-CHARSIGIL-CANON-FILENAME-FORMAT
Declaration Ordering
Top-level declarations must appear in this category order:
t => e => c => λ => test
Module scope is declaration-only. Top-level l is invalid.
No export Keyword
Current Sigil does not have an export token.
Visibility is file-based:
- declarations in
.lib.sigilfiles are referenceable from other modules .sigilfiles are executable-oriented
Function and Lambda Surface
Canonical function/lambda rules:
- parameter types are required
- return types are required
- effects, when present, appear between
=>and the return type =is required before non-matchbodies=is forbidden beforematchbodies
Examples:
λadd(x:Int,y:Int)=>Int=x+y
λfactorial(n:Int)=>Int match n{
0=>1|
1=>1|
value=>value*factorial(value-1)
}
Constants
Current constant syntax is typed value ascription:
c answer=(42:Int)
The older c answer:Int=42 form is not current Sigil.
Records and Maps
Records and maps are distinct.
- records use
: - maps use
↦
Examples:
t User={id:Int,name:String}
t Scores={String↦Int}
Record fields are canonical alphabetical order in:
- product type declarations
- record literals
- typed record constructors
Local Binding Rules
Local names must not shadow names from the same or any enclosing lexical scope.
This applies to:
- function parameters
- lambda parameters
lbindings- pattern bindings
Single-Use Pure Bindings
Sigil currently rejects pure local bindings used exactly once.
Example:
λgreeting(name:String)=>String={
l prefix=("Hello, ":String);
prefix++name++prefix
}
Required canonical form:
λgreeting(name:String)=>String="Hello, "++name++"Hello, "
Current mechanical rule:
- if a local binding is pure
- and the bound name is used exactly once
- the binding is rejected and must be inlined
The current validator does not perform a separate “substitution legality” analysis. This document describes the implementation as it exists today.
No Dead Surface
Sigil also rejects dead names where the compiler can determine they serve no purpose.
Current enforced rules:
- extern declarations must be used
- named local bindings used zero times are rejected
- executable
.sigilfiles reject top-level functions, consts, and types that
are not reachable from main or tests
Library note:
.lib.sigilfiles may still expose top-level declarations that are unused in
the defining file, because the file surface is the module API
Canonical List Processing
Sigil now rejects a small set of exact recursive list-plumbing clones when the language already has one canonical surface.
Current exact-shape bans:
- recursive append-to-result of the form
self(rest)⧺rhs - hand-rolled recursive
allclones - hand-rolled recursive
anyclones - filter followed by length of the form
#(xs filter pred) - hand-rolled recursive
mapclones - hand-rolled recursive
filterclones - hand-rolled recursive
findclones - hand-rolled recursive
flatMapclones - hand-rolled recursive
reverseclones - hand-rolled recursive
foldclones
Canonical replacements:
- universal checks:
§list.all - existential checks:
§list.any - predicate counting:
§list.countIf - projection:
map - filtering:
filter - first-match search:
§list.find - flattening projection:
§list.flatMap - reduction:
reduce ... from ...or§list.fold - reversal:
§list.reverse - custom list building: wrapper + accumulator helper, reversing once at the end if needed
These are exact-shape validator rules, not general algorithm analysis. Recursive algorithms that do not match these narrow patterns remain valid.
Canonical Helper Wrappers
Outside language/stdlib/, Sigil also rejects exact top-level helper wrappers when the body is already one canonical helper surface over that function's own parameters.
Current exact-wrapper bans:
- direct
§...helper calls whose arguments are exactly the function parameters - direct
mapwrappers likexs map fn - direct
filterwrappers likexs filter pred - direct
reduce ... from ...wrappers likexs reduce fn from init
Examples of rejected shapes:
λsum1(xs:[Int])=>Int=§list.sum(xs)
λproject[T,U](fn:λ(T)=>U,xs:[T])=>[U]=xs map fn
Required canonical forms:
λdouble(xs:[Int])=>[Int]=xs map (λ(x:Int)=>Int=x*2)
λreportedSum(xs:[Int])=>String=§string.intToString(§list.sum(xs))
This is still a narrow exact-shape rule. Sigil does not try to prove that arbitrary helper code is semantically equivalent to a canonical stdlib/helper surface.
Topology / Config Boundaries
For topology-aware projects:
- topology declarations live in
src/topology.lib.sigil - selected environment bindings live in
config/.lib.sigil process.envis only allowed inconfig/*.lib.sigil- application code must use topology dependency handles, not raw endpoints
Validation is currently per selected --env, not a whole-project scan across all declared environments.
Printer-First Source
Sigil no longer describes canonicality mainly as a checklist of spacing rules. The authoritative rule is:
- parse source
- print the canonical source for that AST internally
- reject the file unless the bytes match exactly
That gives Sigil a source normal form:
- one textual representation per valid AST
- no public formatter command
- no "preferred style" separate from the language
Some surface constraints are still easiest to think about mechanically:
- signatures print on one line
- direct
matchbodies begin on that same line - multi-arm
matchprints multiline - branching and other non-trivial structure print multiline earlier than dense inline forms
- string values containing newline characters print as multiline
"literals with exact preserved line breaks
Canonical examples:
λfib(n:Int)=>Int match n{
0=>0|
1=>1|
value=>fib(value-1)+fib(value-2)
}
Canonical Branching Recursion
Sigil rejects one narrow recursive shape as non-canonical: sibling self-calls that all directly reduce the same parameter while leaving the other arguments unchanged.
Blocked Pattern
λfib(n:Int)=>Int match n{
0=>0|
1=>1|
value=>fib(value-1)+fib(value-2) // ❌ SIGIL-CANON-BRANCHING-SELF-RECURSION
}
Sigil rejects this shape because it duplicates work instead of following one canonical recursion path.
Canonical Replacement
λfib(n:Int)=>Int=fibHelper(0,1,n)
λfibHelper(a:Int,b:Int,n:Int)=>Int match n{
0=>a|
count=>fibHelper(b,a+b,count-1)
}
The preferred replacement is a wrapper plus helper function that threads the working state through one recursive step at a time.
What Gets Rejected
Sigil rejects only exact branching self-recursion when all of these are true:
- there are multiple sibling self-calls in the same expression
- each self-call directly reduces the same parameter, such as
n-1andn-2 - the other arguments are unchanged across those sibling calls
Sigil also rejects obvious nested amplification of that same shape, such as:
λbad(n:Int)=>Int=bad(bad(n-1)+bad(n-2))
Allowed Patterns
Single recursive call:
λlength(xs:[Int])=>Int match xs{
[]=>0|
[h,.tail]=>1+length(tail)
}
Different non-reduced arguments:
λmerge(left:[Int],right:[Int])=>[Int] match left{
[]=>right|
[lh,.lt]=>match right{
[]=>left|
[rh,.rt]=>match lh≤rh{
true=>[lh]⧺merge(lt,right)|
false=>[rh]⧺merge(left,rt)
}
}
}
Sigil does not attempt general complexity proofs or general exponential-recursion detection. This rule exists to ban one specific non-canonical recursion shape with a clear canonical replacement.
Error Code
SIGIL-CANON-BRANCHING-SELF-RECURSION - Non-canonical branching self-recursion detected. Use a wrapper plus helper state-threading shape instead of sibling self-calls over the same reduced parameter.
Validation Pipeline
Canonical validation happens in two stages:
- after parsing, for syntax- and structure-level canonical rules
- after typechecking, for typed canonical rules such as dead-binding rejection
and single-use pure bindings
The overall pipeline is:
read source
=> tokenize
=> parse
=> canonical validation
=> typecheck
=> typed canonical validation
=> codegen / run / test
Source of Truth
When prose disagrees with implementation, current truth comes from:
- parser
- validator
- typechecker
- runnable examples and tests