summaryrefslogtreecommitdiff
path: root/src/unicode.hh
AgeCommit message (Collapse)Author
2024-08-24Add back <cwchar> for wcwidth in src/unicode.hhChris Webb
On a musl system with clang 18.1.8 linking against libc++, 64ed046e breaks the build with src/unicode.hh:105:24: error: use of undeclared identifier 'wcwidth' 105 | const auto width = wcwidth((wchar_t)c); though this doesn't happen on the same system with gcc 14.2.0 linking against libstdc++. Include <cwchar> again so wcwidth() is properly defined.
2024-08-15Remove some more unnecessary includesMaxime Coste
2024-08-12Reduce headers dependency graphMaxime Coste
Move more code into the implementation files to reduce the amount of code pulled by headers.
2024-02-06Avoid iswlower, iswupper, towlower and towupper for ascii codepointsMaxime Coste
Avoid the costly shared object function call when most codepoints will be ascii. The regex benchmark gets a nice speedup: Regex Before After --------------------------------------+----------+--------- 'Twain' | 25 ms | 15 ms '(?i)Twain' | 74 ms | 57 ms '[a-z]shing' | 323 ms | 303 ms 'Huck[a-zA-Z]+|Saw[a-zA-Z]+' | 26 ms | 17 ms '\b\w+nn\b' | 424 ms | 393 ms '[a-q][^u-z]{13}x' | 869 ms | 815 ms 'Tom|Sawyer|Huckleberry|Finn' | 33 ms | 24 ms '(?i)Tom|Sawyer|Huckleberry|Finn' | 319 ms | 281 ms '.{0,2}(Tom|Sawyer|Huckleberry|Finn)' | 1294 ms | 1293 ms '.{2,4}(Tom|Sawyer|Huckleberry|Finn)' | 1470 ms | 1429 ms 'Tom.{10,25}river|river.{10,25}Tom' | 69 ms | 61 ms '[a-zA-Z]+ing' | 447 ms | 408 ms '\s[a-zA-Z]{0,12}ing\s' | 539 ms | 543 ms '([A-Za-z]awyer|[A-Za-z]inn)\s' | 588 ms | 552 ms '["'][^"']{0,30}[?!\.]["']' | 92 ms | 81 ms
2024-02-06Avoid calling iswalnum for ascii charactersMaxime Coste
iswalnum can be pretty expensive as its a shared library call.
2021-02-25Follow ECMA specification for regex whitespaceJacob Collins
Changes the behaviour of the \s and \h character classes to include all WhiteSpace and LineTerminator characters defined in the ECMA specification. - <https://262.ecma-international.org/11.0/#sec-white-space> - <https://262.ecma-international.org/11.0/#sec-line-terminators> - <https://262.ecma-international.org/11.0/#sec-characterclassescape> Fixes #4034
2019-01-24Header and dependency cleanupMaxime Coste
2018-11-27Make '_' the default extra_word_chars, and remove built-in supportMaxime Coste
Fixes #2599
2018-03-25Unify code that validates identifiers in KakouneMaxime Coste
Session/Client/User modes names are now requiered to be "identifiers" they must be in [a-zA-Z0-9_-]. Option names are the same except they do not allow '-' as they need to be made available through the env vars and '-' is not supported there. Fixes #1946
2017-10-12Refactor column highlighter to make it more robustMaxime Coste
Support arbitrary orders for column highlighters (it was previously failing when column highlighters were not applied in column order). Fix show_matching tab handling at the same time (horizontal scrolling, tab characters and show_matching were behaving badly). Window highlighting now runs user highlighters, then built-ins for each phases, instead of running all phases for user highlighters, then all phases for built-ins. We now consider unprintable character to be 1-column width as we know we will display them as "�". Fixes #1615 Fixes #1023
2017-10-07Add is_upper and is_lower helper unicode functionsMaxime Coste
2017-08-29Rename containers.hh to ranges.hh (and Container to Range)Maxime Coste
2017-07-07Treat non printable characters as zero-width instead of -1 widthMaxime Coste
This fix a bug when opening a file where a line has a lot of unprintable chars (like a binary file) which was confusing Kakoune into considering that the line length in column was negative.
2017-06-26Use the extra_word_chars option in word based normal commandsMaxime Coste
the completion_extra_word_chars is now gone, superseeded by extra_word_chars that gets used both for completion and for normal mode. Fixes #1304
2017-04-23Add noexcept specifiers to unicode and utf8 functionsMaxime Coste
2017-02-23Tweak some character categorization function implementationsMaxime Coste
2017-01-08Apply clang-tidy modernize to the codebaseMaxime Coste
2016-11-28Cleanup include dependencies a bitMaxime Coste
2016-10-04Treat '\n' as 1 column wideMaxime Coste
Fixes #842
2016-10-01Enable _XOPEN_SOURCE=700 on cygwin to get the wcwidth functionMaxime Coste
2016-10-01Rename get_width to codepoint_widthMaxime Coste
2016-10-01Support codepoints of variable widthMaxime Coste
Add a ColumnCount type and use it in place of CharCount whenever more appropriate, take column size of codepoints into account for vertical movements and docstring wrapping. Fixes #811
2016-05-19Go back to libc locale and use c_regex_traitsMaxime Coste
Unfortunately, cygwin does not support c++ locales.
2016-05-11Use C++ locale based functions instead of the libc onesMaxime Coste
2016-04-04Tweak categorize(Codepoint) implementationMaxime Coste
2015-11-15Move is_basic_alpha to unicode.hhMaxime Coste
2015-11-11Fix to_lower/to_upper handling to correctly support non unicode charsMaxime Coste
require a proper unicode locale setup on the system Fixes #94
2015-07-01Refactor select_arguments and slightly change behaviour for non-innerMaxime Coste
non inner argument contains the argument, preceeding whitespaces, and eventual ending comma, except for first arguments (that contains the whitespaces after the comma), and last argument (that contains the comma before it).
2015-04-29Use char32_t for CodepointMaxime Coste
2015-04-15Remove is_blank, which is identical to is_horizontal_blankMaxime Coste
2014-01-05Use wide character function for categorizing codepointsMaxime Coste
Previously we used the is... rather than isw... These functions were not supporting non ascii characters correctly
2013-12-14Move template selectors to the headerMaxime Coste
2013-11-17move is_horizontal_blank to unicode.hhMaxime Coste
2013-07-15use C isalnum instead of C++ std::isalnumMaxime Coste
2013-04-09sort includes directivesMaxime Coste
2013-02-26use std::isalnum for is_word implementation in order to support unicodeMaxime Coste
2012-10-09add a unicode.hh header for Codepoint related functions, ↵Maxime Coste
s/utf8::Codepoint/Codepoint/