| Age | Commit message (Collapse) | Author |
|
<memory> is a costly header we can avoid by just implementing
UniquePtr ourselves, which is a pretty straightforward in modern
C++, this saves around 10% of the compilation time here.
|
|
This helps the compiler realize that data cannot change and does
not need reloading, improving codegen slightly.
|
|
|
|
Instead of jumping into the general CharClass code, detect simple
[a-z] style ranges and use a specific op.
Also detect when a range can be converted to ignore case
|
|
decrement and post_increment do not get cmov optimised as expected,
we can avoid this altogether by taking advantage of the fact that
capacity is always a power-of-two and we can hence use a bitwise and
we can use a bitwise and to loop around capacity.
|
|
This got pushed by accident
This reverts commit d92496449d0c9655253ad16363685bb8446dc582.
|
|
Its unclear that maintaining a small instruction
size outweigh the cost of handling wrapping of
the current_step every 64K codepoints, this makes
the code simpler.
|
|
|
|
Split last iteration out of the loop so that optimizer can elide
most comparisons between pos and config.end as its always different
in the loop and equal at last call.
|
|
Ensure push/pulls operations are inlined except for the uncommon
grow.
|
|
|
|
|
|
Use tighter codegen for that pretty common use case.
|
|
|
|
I noticed that reverse searches ending in "." stopped working in
version 2024.05.08:
kak -n -e "exec %{%cfoobar<ret><esc>gj<a-/>foo.<ret>}'
Bisects ca7471c25 (Compute StartDesc with an offset to effective start,
2024-03-18) which updated the find_next_start() logic for the forward
case but not for backward case. Add a symmetrical change and test
case, that seems to fix it. Not 100% sure if this is correct but
feels so.
|
|
Move more code into the implementation files to reduce the amount
of code pulled by headers.
|
|
|
|
|
|
The previous tradeoff of having a very small Thread struct is not
necessary anymore as we do not memcpy Threads on swap_next since
d708b77186c1685dcbd2298246ada7d204acec2f.
This requires offsets to be used instead of indices for jump/split
ops.
|
|
|
|
|
|
|
|
Keep this closer to the point of use, avoid pull ref_ptr.hpp into
regex_impl.hpp
|
|
This means `.{2,4}foo` will now consider 4 or less before f as
a start candidate instead of every characters
|
|
There is no need to push threads for each codepoint when we know
they will fail as the current codepoint is not a start candidate.
|
|
This crashes in unit tests
This reverts commit cde5f5a25838b2c9a2bf198b819a58d723b434a3.
|
|
This sometimes allocates saves too eagerly, but it removes a branch
in release saves that executes on every thread failing which seems
slightly better.
|
|
When creating a new save, we had to clear all iterators to have valid
values. This operation is relatively costly because it gets optimized
to a memset whose call overhead is pretty high (as we usually have
less than 32 bytes to clear). Bypass this by storing a bitmap of
valid iterators.
|
|
Store values for all possible bytes and fill utf8 multi byte start
values when necessary.
|
|
|
|
|
|
Profiling shows that this does not always get the utf8::read_codepoint
call inlined and that almost doubles the time spent in the function.
|
|
Most Save access are to modify the refcount. Now that the freelist
is index based it is not necessary to keep Save objects at fixed
memory locations.
|
|
Remove redundant checking for end and double indirection to get
instructions pointer.
|
|
|
|
If the first byte in the multi-byte utf8 sequence does not match,
it means the "other" character is not set, so none of the sequence
byte will match (as they are all with the MSB set). This tightens
the critical loop which ends up running faster in most cases.
|
|
Fixes: https://github.com/mawww/kakoune/issues/4937
|
|
This paves the way towards being able to cancel long regex matching
operations
|
|
The previous code was assuming it was fine to push_next without
growing, which used to be the case with the previous implementation
because we always have poped the current thread that we try to push.
However now that we use a ring-buffer, m_next_begin == m_next_end can
either mean full, or empty. We solve this by assuming it means empty
and never allowing the buffer to become full, which means we need
to grow after pushing to next if we get full.
Fixes #4859
|
|
Instead of potentially decoding for each thread, always decode as
its only slightly slower than finding next codepoint (which will
be necessary anyway) and pass the codepoint to each thread.
|
|
We can just compute whenever we reset last_step, which does not happen
often and we know `forward` at compile time anyway
|
|
Take advantage of ranges sorting to early out, make the logic
inline.
|
|
We only grow when the ring buffer is full, which allows for a nice
simplification of the code.
Tell grow_ifn if we pushed in current or next so that we can
distinguish between filled by next or filled by current when
m_current == m_next_begin
|
|
Instead of two stacks growing from the two ends of a buffer, use
a ring buffer growing from the same mid spot.
This avoids the costly memory copy every step when we set next
threads as the current ones.
|
|
This does not seem to actually speed up execution as threads will
be dropped on next step anyway
|
|
This could lead to reading past subject string end in certain
conditions
Fixes #4794
|
|
|
|
|
|
Instead of storing regexes in each regions, move them to the core
highlighter in a hash map so that shared regexes between different
regions are only applied once per update instead of once per region
Also change iteration logic to apply all regex together to each
changed lines to improve memory locality on big buffers.
For the big_markdown.md file described in #4685 this reduces
initial display time from 3.55s to 2.41s on my machine.
|
|
|