| Age | Commit message (Collapse) | Author |
|
To match with the Unicode support in Go 1.21, we'll now use the Unicode 15
tables when we're normalizing Unicode strings and when counting
user-perceived characters ("grapheme clusters") for source position
purposes.
|
|
* [COMPLIANCE] Add Copyright and License Headers
* add copywrite file and revert headers in testdata
---------
Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>
Co-authored-by: Liam Cervante <liam.cervante@hashicorp.com>
|
|
When processing a template string, the lexer can emit multiple string
literal tokens for what ought to be a single string literal. This occurs
when the string contains escape sequences, or consecutive characters
which are indistinguishable from escape sequences at tokenization time.
This leads to a confusing AST and causes heuristics about template
expressions to fail. Specifically, when parsing a traversal with an
index, a key value containing an escape symbol will cause the parser to
generate an index expression instead of a traversal.
This commit adds a post-processing step to the template parser to meld
any sequences of string literals into a single string literal. Existing
tests covered the previous misbehaviour (several of which had comments
apologizing for it), and have been updated accordingly.
The new behaviour of the `IsStringLiteral` method of `TemplateExpr` is
covered with a new set of tests.
|
|
For parts of the input that are delimited by start and end tokens, our
typical pattern is to keep scanning for items until we reach the end token,
or to generate an error if we encounter a token that isn't valid.
In some common examples of that we'll now treat TokenEOF as special and
report a different message about the element being unclosed, because it
seems common in practice for folks to leave off closing delimiters and
then be confused about HCL reporting a parse error at the end of the
file. Instead, we'll now report the error from the perspective of the
opening token(s) and describe that construct as "unclosed", because the
EOF range is generally less useful than any range that actually contains
some relevant characters.
This is not totally comprehensive for all cases, but covers some
situations that I've seen folks ask about, and some others that were
similar enough to those that it was easy to modify them in the same ways.
This does actually change one of the error ranges constrained by the
specification test suite, but in practice we're not actually using that
test suite to represent the "specification" for HCL, so it's better to
change the hypothetical specification to call for a better error reporting
behavior than to retain the old behavior just because we happened to
encoded in the (unfinished) test suite.
|
|
It's a common error to accidentally include a ${ intended to be processed
as part of another language after HCL processing, such as a shell script
or an AWS IAM policy document.
Exactly what sort of error will appear in that case unfortunately depends
on the syntax for that other language, but a lot of them happen to end up
in the codepath where we report the "Extra characters after interpolation
expression" diagnostic message. For that reason, this extends that error
message with an additional hint about escaping, and I've also included
a special case for colons because they happen to arise in both Bash and
AWS IAM interpolation syntax, albeit with different meanings.
Because this message is coming from HCL and HCL doesn't typically assume
anything about the application where it's being used, the hint message
is pretty generic and focuses only on the hint about the escaping syntax,
in the hope that this will be hint enough to prompt the user to think
about what they are currently working on and realize how to respond to
this error.
|
|
HCL uses a number of upstream libraries that implement algorithms defined
in Unicode. This commit is updating those libraries all to versions that
have Unicode 13 support.
The main implication of this for HCL directly is that when it returns
column numbers in source locations it will count characters using the
Unicode 13 definition of "character", which includes various new
multi-codeunit characters added in Unicode 13.
These new version dependencies will also make Unicode 13 support available
for other functionality that HCL callers might use, such as the stdlib
functions in upstream cty, even though HCL itself does not directly use
those.
|
|
HCL uses grapheme cluster segmentation to produce accurate "column"
indications in diagnostic messages and other human-oriented source
location information. Each new major version of Unicode introduces new
codepoints, some of which are defined to combine with other codepoints to
produce a single visible character (grapheme cluster).
We were previously using the rules from Unicode 9.0.0. This change
switches to using the segmentation rules from Unicode 12.0.0, which is
the latest version at the time of this commit and is also the version of
Unicode used for other purposes by the Go 1.14 runtime.
HCL does not use text segmentation results for any purpose that would
affect the meaning of decoded data extracted from HCL files, so this
change will only affect the human-oriented source positions generated for
files containing characters that were newly-introduced in Unicode 10, 11,
or 12. (Machine-oriented uses of source location information are based on
byte offsets and not affected by text segmentation.)
|
|
|
|
The main HCL package is more visible this way, and so it's easier than
having to pick it out from dozens of other package directories.
|