summaryrefslogtreecommitdiff
path: root/src/regex_impl.cc
AgeCommit message (Collapse)Author
2018-11-04Dump start description as well when writing a regex dumpMaxime Coste
2018-11-03Remove most regex impl special casing for backwards matchingMaxime Coste
2018-11-01Support different type for iterators and sentinel in utf8 functionsMaxime Coste
2018-10-10Cleanup regex lookarounds implementation and reject incompatible regexMaxime Coste
Fixes #2487
2018-07-08Tweak comment to make it less ambiguousMaxime Coste
2018-06-24Use a dedicated vm op for dot when match-newline is falseOlivier Perret
2018-06-24Use bit-flags for storing regex regex optionsOlivier Perret
2018-06-24Add support for regex flag to toggle dot-matches-newlineOlivier Perret
2018-04-30Fix wrong use of constexprMaxime Coste
2018-04-29Regex: Use only 128 characters in start desc and encode others as 0Maxime Coste
Using 257 was using lots of memory for no good reason, as > 127 codepoint are not common enough to be treated specially.
2018-04-28Merge remote-tracking branch 'lenormf/regex-format-string' into HEADMaxime Coste
2018-04-28fix potential overflow in dump_regexMaxime Coste
2018-04-27regex_impl: Fix a potential format string flawFrank LENORMAND
2018-04-27Add a debug regex command to dump regex instructionsMaxime Coste
2018-04-27Use indices instead of pointers for saves/instruction in ThreadedRegexVMMaxime Coste
Performance seems unaffacted, but memory usage should be lowered as the Thread struct is 4 bytes instead of 16.
2018-04-05Fix some trailing spaces and a tab that sneaked into the code baseMaxime Coste
2018-03-20Regex: Only allow SyntaxCharacter and - to be escaped in a character classMaxime Coste
Letting any character to be escaped is error prone as it looks like \l could mean [:lower:] (as it used to with boost) when it only means literal l. Fix the haskell.kak file as well. Fixes #1945
2018-03-05Regex: take the full subject range as a parameterMaxime Coste
To allow more general look arounds out of the actual search range, pass a second range (the actual subject). This allows us to remove various flags such as PrevAvailable or NotBeginOfSubject, which are now easy to check from the subject range. Fixes #1902
2018-02-24Regex: Improve comments and constify some variablesMaxime Coste
Reword various comments to make some tricky parts of the regex engine easier to understand.
2018-02-09Regex: Use a template argument instead of a regular one for "forward"Maxime Coste
forward (which controls if we are compling for forward or backward matching) is always statically known, and compilation will first compile forward, then backward (if needed), so by having separate compiled function we get rid of runtime branches.
2018-02-09Regex: minor code cleanupMaxime Coste
2017-12-01Regex: Support forward and backward matching code in the same CompiledRegexMaxime Coste
No need to have two separate regexes to handle forward and backward matching, just passing RegexCompileFlags::Backward will add support for backward matching to the regex. For backward only regex, pass RegexCompileFlags::NoForward as well to disable generation of forward matching code.
2017-12-01Regex: Do not allow private use codepoints literalsMaxime Coste
We use them to encode non-literals in lookarounds, so they can trigger bugs. Fixes #1737
2017-12-01Regex: rename StartChars to StartDescMaxime Coste
It only contains chars for now, but its still more generally describing where matches can start.
2017-11-30Regex: optimize parsing a bitMaxime Coste
2017-11-30Regex: smarter handling of start chars computation for character classMaxime Coste
2017-11-28Regex: Various small code tweaksMaxime Coste
2017-11-28Regex: optimize compilation by reserving dataMaxime Coste
2017-11-28Regex: Tweak is_ctype implementation styleMaxime Coste
2017-11-25Regex: Replace generic 'Matchers' with specialized functionalityMaxime Coste
Introduce CharacterClass and CharacterType Regex Op, and optimize their evaluation.
2017-11-25Regex: do not decode utf8 in accept calls as they always run on asciiMaxime Coste
2017-11-13Regex: add unit test for #1693Maxime Coste
2017-11-12Fix #1693: typo in RegexParser::character_class()fsub
2017-11-01Regex: remove dead codeMaxime Coste
2017-11-01Regex: Tweak struct layouts of ParsedRegex dataMaxime Coste
2017-11-01Regex: Remove "Ast" from names in the ParsedRegexMaxime Coste
It does not add much value, and makes names longer.
2017-11-01Regex: Optimize parsing and compilationMaxime Coste
AstNodes are now POD, stored in a single vector, accessed through their index. The children list is implicit, with nodes storing only the node index at which their child graph ends. That makes reverse iteration slower, but that is only used for reverse matching regex, which are uncommon. In the general case compilation is now faster.
2017-11-01Regex: minor cleanup of the regex parsing codeMaxime Coste
2017-11-01Regex: small code cleanup in the Save compilation codeMaxime Coste
2017-11-01Regex: put the other char boolean inside the general start char mapMaxime Coste
2017-11-01Regex: Fix handling of all unicode codepoint as start charsMaxime Coste
2017-11-01Regex: fix wrong fallthough in dump_regexMaxime Coste
2017-11-01Regex: Go back to instruction based search of next startMaxime Coste
The previous method, which was a bit faster in the general use case, can hit some cases where we get quadratic behaviour and very slow matching. By using an instruction, we can guarantee our complexity of O(N*M) as we will never have more than N threads (N being the instruction count) and we run the threads once per codepoint in the subject string. That slows down the general case slightly, but ensure we dont have pathological cases. This new version is much faster than the previous instruction based search because it does not use a plain `.*` searcher, but a specific, smarter instruction specialized for finding the next start if we are in the correct conditions.
2017-11-01Regex: add support for \0, \cX, \xXX and \uXXXX escapesMaxime Coste
2017-11-01Regex: compute if codepoints outside of the start chars map can startMaxime Coste
2017-11-01Regex: abort compilation as soon as we hit the instruction count limitMaxime Coste
2017-11-01Regex: add a unit test for why lookaheads dont count for start chars anymoreMaxime Coste
2017-11-01Regex: comment the mutables in CompiledRegex::Instruction and fix their initMaxime Coste
2017-11-01Regex: Introduce a Regex memory domain to track usage separatelyMaxime Coste
2017-11-01Regex: use binary search to for character class ranges checkMaxime Coste