summaryrefslogtreecommitdiff
path: root/hclsyntax/token.go
AgeCommit message (Collapse)Author
2023-10-24hclsyntax: Initial work on namespaced functionsMartin Atkins
This introduces a new syntax which allows function names to have namespace prefixes, with the different name parts separated by a double-colon "::" as is common in various other C-derived languages which need to distinguish between scope resolution and attribute/field traversal. Because HCL has separate namespaces for functions and variables, we need to use different punctuation for each to avoid creating parsing ambiguity that could be resolved only with infinite lookahead. We cannot retroactively change the representation of function names to be a slice of names without breaking the existing API, and so we instead adopt a convention of packing the multi-part names into single strings which the parser guarantees will always be a series of valid identifiers separated by the literal "::" sequence. That means that applications will make namespaced functions available in the EvalContext by naming them in a way that matches this convention. This is still a subtle compatibility break for any implementation of the syntax-agnostic HCL API against another syntax, because it may now encounter function names in the function table that are not entirely valid identifiers. However, that's okay in practice because a calling application is always in full control of both which syntaxes it supports and which functions it places in the function table, and so an application using some other syntax can simply avoid using namespaced functions until that syntax is updated to understand the new convention. This initial commit only includes the basic functionality and does not yet update the specification or specification test suite. It also has only minimal unit tests of the parser and evaluator. Before finalizing this in a release we would need to complete that work to make sure everything is consistent and that we have sufficient regression tests for this new capability.
2023-08-30Use Unicode 15 tables for unicode normalization and segmentationMartin Atkins
To match with the Unicode support in Go 1.21, we'll now use the Unicode 15 tables when we're normalizing Unicode strings and when counting user-perceived characters ("grapheme clusters") for source position purposes.
2023-02-28[COMPLIANCE] Add Copyright and License Headers (#586)hashicorp-copywrite[bot]
* [COMPLIANCE] Add Copyright and License Headers * add copywrite file and revert headers in testdata --------- Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com> Co-authored-by: Liam Cervante <liam.cervante@hashicorp.com>
2022-02-16hclsyntax: Allocate copy of tok.Range only when it's neededLeandro López (inkel)
In a similar fashion as the parent commit, here instead of always copying the tok.Range for later use, we define a function to get this copied value, and thus we only allocate the copy if it's needed, otherwise don't. For the benchmark introduced earlier, the reduction in allocations and memory usage is outstanding: name old time/op new time/op delta LexConfig-12 9.05µs ± 1% 7.83µs ± 1% -13.54% (p=0.000 n=10+10) name old alloc/op new alloc/op delta LexConfig-12 7.98kB ± 0% 6.06kB ± 0% -24.07% (p=0.000 n=10+10) name old allocs/op new allocs/op delta LexConfig-12 37.0 ± 0% 7.0 ± 0% -81.08% (p=0.000 n=10+10) Benchmarks were created using: go test -benchmem -benchtime=200000x -count=10 -bench=.
2022-02-16hclsyntax: Copy only tok.Range instead of whole objectLeandro López (inkel)
Doing this reduces the memory used in ~11%, as the following benchstat comparison shows: name old time/op new time/op delta LexConfig-12 9.27µs ± 0% 9.03µs ± 1% -2.55% (p=0.000 n=9+10) name old alloc/op new alloc/op delta LexConfig-12 8.94kB ± 0% 7.98kB ± 0% -10.74% (p=0.000 n=10+10) name old allocs/op new allocs/op delta LexConfig-12 37.0 ± 0% 37.0 ± 0% ~ (all equal) Benchmarks were created using: go test -benchmem -benchtime=200000x -count=10 -bench=.
2021-02-23Use Unicode 13 text segmentation rulesMartin Atkins
HCL uses a number of upstream libraries that implement algorithms defined in Unicode. This commit is updating those libraries all to versions that have Unicode 13 support. The main implication of this for HCL directly is that when it returns column numbers in source locations it will count characters using the Unicode 13 definition of "character", which includes various new multi-codeunit characters added in Unicode 13. These new version dependencies will also make Unicode 13 support available for other functionality that HCL callers might use, such as the stdlib functions in upstream cty, even though HCL itself does not directly use those.
2020-08-24hclsyntax: Tailored error for "curly quotes"Martin Atkins
It seems to be somewhat common for someone to share HCL code via a forum or a document and have the well-meaning word processor or CMS replace the straight quotes with curly quotes, which then lead to confusing errors when someone copies the result and tries to use it as valid HCL configuration. Here we add a special hint for that, giving a tailored error message instead of the generic "This character is not used within the language" error message. HCL has always had some of these special hints implemented here, and they were originally implemented with special token types to allow the parser handle them. However, we later refactored to do the check all at once inside the Lex* family of functions, prior to parsing, so it's now relatively straightforward to handle it as a special case of TokenInvalid rather than an entirely new token type. Perhaps later we'll rework the existing ones to also just use TokenInvalid, but that's a decision for another day.
2020-05-21Fix a wrong error message for bitwise ORMasayuki Morita
Fixes #376
2020-03-09Use Unicode 12.0.0 grapheme cluster segmentation rulesMartin Atkins
HCL uses grapheme cluster segmentation to produce accurate "column" indications in diagnostic messages and other human-oriented source location information. Each new major version of Unicode introduces new codepoints, some of which are defined to combine with other codepoints to produce a single visible character (grapheme cluster). We were previously using the rules from Unicode 9.0.0. This change switches to using the segmentation rules from Unicode 12.0.0, which is the latest version at the time of this commit and is also the version of Unicode used for other purposes by the Go 1.14 runtime. HCL does not use text segmentation results for any purpose that would affect the meaning of decoded data extracted from HCL files, so this change will only affect the human-oriented source positions generated for files containing characters that were newly-introduced in Unicode 10, 11, or 12. (Machine-oriented uses of source location information are based on byte offsets and not affected by text segmentation.)
2019-09-09Unfold the "hcl" directory up into the rootMartin Atkins
The main HCL package is more visible this way, and so it's easier than having to pick it out from dozens of other package directories.