Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recent commit breaks LT(5,KC_J) on FC660C #8600

Closed
1 of 3 tasks
Syzygies opened this issue Mar 29, 2020 · 27 comments · Fixed by #8693
Closed
1 of 3 tasks

Recent commit breaks LT(5,KC_J) on FC660C #8600

Syzygies opened this issue Mar 29, 2020 · 27 comments · Fixed by #8693
Assignees

Comments

@Syzygies
Copy link

Syzygies commented Mar 29, 2020

Describe the Bug

I had fallen 27 commits behind master while working on several keyboards. After catching up to March 28 by merging upstream master into my fork, LT(5,KC_J) and similar keycodes stopped working on my FC660C (they still worked on my Planck EZ; I have a DZ60 I didn't test). Rather than momentarily activating the layer, I get stuck in that layer until I unplug and reconnect the keyboard.

While exploring this bug, I added an emergency escape DF(1) from every layer back to my base layer. These keys failed to work at all; I remained stuck in the 'momentary' layer.

Reverting the 27 upstream commits fixed the bug.

My keyboard is fc660c syzygies

My master branch was caught up with the upstream master as of yesterday, and causes the problem. My syzygies branch is my development branch, that again works once I reverted it to discard these 27 commits.

My fork's git history should be public, making precise which commits I've fallen behind on.

System Information

  • Keyboard: FC660C with a Hasu controller
    • Revision (if applicable):
  • Operating system: MacOS 10.14.6
  • AVR GCC version: avr-gcc (GCC) 8.3.0
  • ARM GCC version:
    arm-none-eabi-gcc (GNU Tools for Arm Embedded Processors 8-2019-q3-update) 8.3.1 20190703 (release) [gcc-8-branch revision 273027]
  • QMK Firmware version:
    0.8.79 (both branches report this same version)
  • Any keyboard related software installed?
    • AutoHotKey
    • Karabiner
    • Other: Alfred 4.0.9 [1144]

Additional Context

On request I can do a binary search to determine which of the 27 commits is responsible, and I can see if the DZ60 is affected. Usually I find my less obvious bugs by waiting a few hours and seeing if I slap my forehead. If someone's forehead slap doesn't resolve the bug, but suggests a commit to look at first, that would save me time.

I am a Kinesis Advantage survivor, so of course the first thing I considered was dropped USB events, but that wouldn't explain the escape DF(1) not working, or reverting the 27 upstream commits fixing everything. This is not sporadic behavior, it is 100% reproducible.

@Erovia
Copy link
Member

Erovia commented Mar 29, 2020

Hi @Syzygies, as I can see currently you are using KC_NO (XXXXXXX) on the temporary layers.
Could you do a test with KC_TRNS (_______) for me?

@tzarc
Copy link
Member

tzarc commented Mar 29, 2020

Struck the same thing on a test board locally -- KC_NO on target layer causes issue, KC_TRNS on target layer makes it work fine.

@fauxpark
Copy link
Member

When you use keycodes that activate a layer such as MO() or LT(), the keycode at the same spot on the target layer must either be KC_TRNS or the exact same keycode, otherwise you can lock yourself into that layer. This has always been the case, so perhaps there was actually a regression that has been fixed.

@Syzygies
Copy link
Author

Syzygies commented Mar 29, 2020

I will make all tests suggested so far, including binary search to locate the commit responsible for this.

As for "the keycode at the same spot on the target layer must either be KC_TRNS or the exact same keycode" I did notice this in the doc, and I noticed that it had never applied to me before. Is there an explanation anywhere? Anti-bounce? 16 bits? If it was "fixed" by accident, can we fix it for real? If I write my own loop, will this still apply to me?

It's dumb luck / serendipity that my ANSI symbol layer needs 28 symbols, and there are 30 core keys in common for every keyboard I can imagine. I was using the two holes for redundant Backspace and Delete keys, but I can instead put the two holes under the tap/hold keys (in my case FJ) that access this layer.

I really don't want to punch a hole in my number layer. I actually generate my keymap.c file (and the layout diagram) from a .tsv file. I could reprogram all this to create a pair of layers for each conceptual layer, differing only in the location of this required hole. I would feel silly doing this, if it were easy for me to instead eliminate this restriction. And the posted code is missing a layer I took back out as part of debugging. One reads elsewhere restrictions on going past 16 layers, e.g. the MT code. If I'm using two actual layers for each conceptual layer, I'll soon run out.

@tzarc
Copy link
Member

tzarc commented Mar 29, 2020

For reference, I bisected it, 5117dff6a26aec4eca04fb9787b4f428884739bc is the first bad commit.

@Syzygies
Copy link
Author

Syzygies commented Mar 30, 2020

I was unable to locate this commit on GitHub, but I have it in my terminal:

commit 5117dff
Author: Drashna Jaelre [email protected]
Date: Sun Mar 22 06:29:05 2020 -0700

Add Post Processing to process_record (#4892)

* Improve process_record system

Code based on @colinta's

* Rename and better handle functions

* Fix incorrect function call to process_record_user

* Add documentation for post_process_record

* Add both get_event_keycode and get_record_keycode functions

And add some comments about these functions

* Update code format

* Cleanup merge artifacts

My suspicion from viewing this in my terminal is that the breaking changes are in post_process_record_user(), but I'm not sure, and I don't understand the code that well.

@Syzygies
Copy link
Author

KC_NO on target layer causes issue, KC_TRNS on target layer makes it work fine.

I can also confirm this is my issue. Creating a pair of layers for each conceptual layer is a workaround. I'll try to figure out the exact code that is causing this, within the 5117dff6a26aec4eca04fb9787b4f428884739bc commit.

@fauxpark
Copy link
Member

fauxpark commented Mar 30, 2020

To elaborate on why this is not supposed to work, we have to consider how keycodes are resolved.

When a key is pressed, QMK searches through all active layers from highest to lowest. When, say, a MO(1) keycode is pressed from layer 0, layer 1 is not yet active, and so the keycode at that spot is ignored - the keypress is resolved as the MO(1), which activates layer 1.

On keyup, the same thing happens, but this time since the layer is active, the layer scan hits that keycode. If it's a KC_TRNS, the scan continues downwards looking for the next non-transparent key, which should usually be the MO(1) on layer 0. If the keycode is anything else (or MO(1) as it will have the exact same numerical value), then the layer will never be deactivated because the keyup event (and subsequent keydowns) that was resolved was for a different keycode.

@Syzygies
Copy link
Author

Syzygies commented Mar 30, 2020

To elaborate on why this is not supposed to work

Thank you, that helps. I've redone my layout after thinking this through. I've automated the necessary duplication of layers, to work around this restriction. My fc660c syzygies keymap shows these changes.

It's important to me that my keyboards offer a base layer that is as safe as possible. Say, a student wants to borrow my office keyboard (not a current concern), and I don't want them accidentally rearranging the orbits of the planets. So I was worried that because of this behavior, I'd be forced to give up a key on my safety layer, to support switching back to it from advanced layers. Fortunately, the following works: Provide access to one's function layer from the backspace key using LT(2,KC_BSPC), and put TG(1) somewhere on that function layer. This passes testing; I can go back and forth successfully while avoiding this issue.

So is this still a bug report, or a user whose luck ran out?

I'd still like to understand the code for myself; if this is easy enough to fix that it temporarily got fixed by accident, my vote would be to adopt the fix. This restriction leads to less efficient layouts. I have a GergoPlex: Heavy on order, but he has some really small boards too.

Ideally, doesn't one want to cache the identity of a keypress so the keyup is interpreted the same way? Instead rescanning active layers almost works, except for this issue. My guess is that this is historical, an artifact from TMK.

Does anyone actually rely on this as a feature? No one seems to have reported the bug while this behavior had accidentally changed.

In any case, it appears that I can ultimately duck all of this by providing my own process_record_user(), which I plan to do in any case. Overloading this many alpha keys can get dicey, no matter how I set the standard tweaks for this.

@mu-b
Copy link

mu-b commented Apr 1, 2020

Out of interest, how does this relate to the use of OSL? I have a LT(_LTRAN, KC_SPC) with OSL(_L2) set for key X on layer _LTRAN, hitting X again should invoke a macro on _L2, previously it would return to the original layer, this is now changed. I'm assuming that OSL semantics have now changed in line with the above in that I cannot use the key X on _L2 with now has to be KC_TRNS but instead some other key.

@newtonapple
Copy link
Contributor

I ran into the same issue. Here is as simple use case I created: https://github.com/newtonapple/qmk_firmware/blob/newtonapple/layerbug/layouts/community/ortho_4x12/newton/keymap.c.

I personally have few usecases where I put different KC_* codes under the LT keys. For example, I have 2 different LTs that are pointing to the same layer. I can alternate between the two LT keys to gain access to the keys hidden underneath them (see the code above). Forcing KC_TRNS to be the only key code behind LTs seems a bit restrictive.

@fauxpark
Copy link
Member

fauxpark commented Apr 5, 2020

Forcing KC_TRNS to be the only key code behind LTs seems a bit restrictive.

Necessarily so. Think about it: if you hold an LT(), activating the target layer, what key is now in the "pressed" state? It must have the same keycode, otherwise, when you release it, the event that is fired will not deactivate the layer.
The "workaround" for this (really, the correct way to do it) is to create two near-identical layers, one with the transparent key over one LT(), and one with the transparent key over the other LT().

@Syzygies
Copy link
Author

Syzygies commented Apr 5, 2020

That was exactly my fix; I use a Ruby script to generate actual layers from a tab separated values file, cloning near-identical layers as needed. And I'm just about out of layers.

The firmware sees which physical key (xy-matrix) is pressed and released. The firmware could cache each physical xy-matrix location separately, saving the meaning of each key when it goes down, and using that meaning when the key comes back up, ignoring all layer changes that have happened since. It doesn't do this; the firmware instead rescans using the now-current context. That is an artifact inherited from FMK, not a law of physics.

And somehow this all got fixed by accident, for a while! I still don't understand the relevant code.

@grassbl8d
Copy link

I also encountered this just now. The LT is permanently staying on that layer anymore. I thought I was doing something wrong as this has always been my configuration. My keyboard is kbdfans dz60 rgb ansiv2.

Thanks.

@drashna drashna self-assigned this Apr 5, 2020
@drashna
Copy link
Member

drashna commented Apr 5, 2020

Oh, I think I see what the issue is ....

@drashna
Copy link
Member

drashna commented Apr 5, 2020

Okay, narrowed down the issue. An unintended, and odd ... one.

Specifically, the cause is that this is getting called:

update_source_layers_cache(event.key, layer);

It's updating the layer cache, it would appear, and causing the issue.

Fortunately, this does appear to be a simple fix, at least. (tested already).

@drashna
Copy link
Member

drashna commented Apr 5, 2020

For anyone running into this issue, could you test the above PR and verify that it fixes the issue?

If you need help testing it, let me know.

@Syzygies
Copy link
Author

Syzygies commented Apr 5, 2020

I have to step out, I'll finish testing in an hour, but for anyone else, here is how I managed to pull in this commit:

git remote add drashna https://github.com/drashna/qmk_firmware.git
git fetch drashna
git cherry-pick 175fb39c478785529c1136ff1f2b03ef48fca483
git cherry-pick f62426cc358e742d68061cc4842c90344ec464e0

@newtonapple
Copy link
Contributor

@drashna I've tested this on my test case as well as the Planck rev6, XD75 & Candy Bar. The patch fixed the problem! 👍

@Syzygies
Copy link
Author

Syzygies commented Apr 6, 2020

Thanks!

This commit also fixes my reported problem. However, my testing uncovered a related problem I don't understand. It is definitely related to / entangled in the logic of this commit, though this commit did not introduce the problem.

I usually access my layer 3 (FJ) and layer 4 (DK) temporarily using FJ, DK as tap/hold keys. However, I sometimes want to lock those layers on. So I made the capslock key for those layers TG(3) and TG(4) respectively.

I thought of this after editing my script to create near-identical layers with transparent keys, to work around the issue of this bug report. That fix accidentally made my TG(3) and TG(4) capslock keys work, because I was always actually calling these keycodes from higher layers.

Using tap/hold FJ to access layer 3 for the purpose of executing TG(3) on layer 3 doesn't work. It's a no-op. However, TG(3) from any other layer does work. And once one is on layer 3 somehow (other than temporarily via tap/hold), TG(3) does work to get back off layer 3. TG(3) even works to get off layer 3 when one becomes stuck on layer 3 because of this bug. It just doesn't work to get to layer 3 in the first place, using tap/hold FJ.

Confused? I was. This was amusing to identify. My own fix for the previous problem was obscuring this problem.

Here is a cached copy of my layout: fc660c

I added TG(3) and TG(4) to layers 5 and 6 for test purposes, once I decided this must be the issue.

@drashna
Copy link
Member

drashna commented Apr 6, 2020

I added TG(3) and TG(4) to layers 5 and 6 for test purposes, once I decided this must be the issue.

That won't work. Those toggle layers 3 and 4 on/off. However, it reads "highest"/top down, so it will read 5 and 6 before 3 or 4. So ... it may break that. But ... I'll test TG in just a sec.

Edit: I tested this on my keymap (swapping out LT for TG, directly) and TG doesn't seem to be impacted.

@newtonapple and anyone else, if you could approve the PR in question (go to "files" in the PR and you can approve it there). You won't get a green check (it will be grey), but it helps us know that it's tested and fixes the issue.

@Syzygies
Copy link
Author

Syzygies commented Apr 6, 2020

That won't work. Those toggle layers 3 and 4 on/off. However, it reads "highest"/top down, so it will read 5 and 6 before 3 or 4. So ... it may break that. But ... I'll test TG in just a sec.

I edited my previous comment in light of further testing.

This is not the same as swapping out LT for TG, directly. It is an interaction between LT and TG.

To be clear, TG(3) doesn't work when only layers 0 and 1 are active, and I'm using tap/hold F or J to temporarily access layer 3 in order to reach the TG(3) on layer 3. The higher layers are disabled. TG(3) does work once one has layer 3 enabled by other means (such as the TG(3) key on layer 5, or by this bug before your fix). And this is true even though there are higher layers; they're disabled.

@drashna
Copy link
Member

drashna commented Apr 6, 2020

Could you post a link to your keymap, as well?
I see that you did above.

Also, a good question is, did this happen before the code change in question? (commit ID 5117dff )

@drashna
Copy link
Member

drashna commented Apr 6, 2020

I've made a couple more changes, and hopefully, that's still fixed (for those that applied my branch/PR)

@Syzygies
Copy link
Author

Syzygies commented Apr 7, 2020

Did this happen before the code change in question? (commit ID 5117dff )

As far as I can tell, this has never worked. I tested back to October 2019 (commit ID de29da973). I couldn't go back much further without my builds breaking for what I assume are unrelated reasons:

git checkout -b test de29da973
error: The following untracked working tree files would be overwritten by checkout:
	lib/lufa/.gitattributes [...]

Should I file a separate bug report, or wait to see what you conclude?

The workaround is clear: Don't depend on TG(3) working on layer 3.

However, layer three is where one wants the TG(3) key to be, if one has another way to reach layer 3. If I have many layers, it's easier to remember that CapsLock means lock that layer, for whatever layer that layer might be at the moment. For me, the FJ layer is that layer that I reach by holding the F or J keys.

I can't see a workaround, other than duplicating layers exactly as I did before you fixed this first bug. Or make capslock itself a tap/hold key, and put all the TG(n) keys on this new layer. As easy to remember as a language, and more efficient with introducing extra layers.

Thanks!

@EricGebhart
Copy link
Contributor

I have the exact same problems, I will test tomorrow. I have add LT() on almost every key of my home row for many years. Luckily this is a new kb. Obviously trans is not the best solution. I also have some osl() keys on the new xd75, those are on all layers via Trans, they do not work properly either.

@foosinn
Copy link

foosinn commented Apr 12, 2020

Hey,

i just rebased the two commits of the open PR to my WIP split atreus.

Awesome work! I did not notice any issues. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants