Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[201811] Invoke disk check periodically #8951

Merged
merged 2 commits into from
Oct 16, 2021

Conversation

renukamanavalan
Copy link
Contributor

@renukamanavalan renukamanavalan commented Oct 11, 2021

Why I did it

When disk become read-only (kernel bug), it blocks new remote users access.
This fix would enable remote user access even in read-only state

How I did it

Keep monitoring disk state.
If it becomes read-only, mount /etc & home dir in tmpfs to become RW and periodically write error logs
With /etc & /home set in RW state, this allows remote user access

How to verify it

  1. Make sure TACACS is configured
  2. Make disk read-only (sudo bash -c "echo u > /proc/sysrq-trigger")
  3. Try logging in as remote user, who has not logged in before.

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106

Description for the changelog

A picture of a cute animal (not mandatory but encouraged)

Why I did it
Helps with periodic scan of disk for RO state.
If found, this script makes transient fix and raise error message.
@renukamanavalan renukamanavalan self-assigned this Oct 11, 2021
@yxieca yxieca changed the title Invoke disk check periodically [201811] Invoke disk check periodically Oct 12, 2021
@yxieca
Copy link
Contributor

yxieca commented Oct 12, 2021

@renukamanavalan please complete the PR template, especially please elaborate who did you test the change? I think this change needs to have a matching submodule update.

@renukamanavalan
Copy link
Contributor Author

Let me update the template:

Test done:
Manually copied the disk_check.py onto device running 201811 at /usr/local/bin/.
Updated /etc/monit/conf.d/sonic-host as in PR 8951 (buildimage)
Restarted monit service
Ensured the switch has TACACS confgured
Make disk read-only
After a pause, tested to login using a remote user credentials of a user who has not logged in before into this device.
In other words, ensure that this created entry in /etc/passwd & created home dir for this user
It worked as expected
Did a device reboot
Confirmed that there is no trace of that user in device (as the updates were done on tmpfs)

@renukamanavalan
Copy link
Contributor Author

@renukamanavalan please complete the PR template, especially please elaborate who did you test the change? I think this change needs to have a matching submodule update.

Let me update the template soon.
Yes, this requires sub module update with PR sonic-net/sonic-utilities#1873

@yxieca yxieca merged commit 52366b0 into sonic-net:201811 Oct 16, 2021
yxieca pushed a commit to yxieca/sonic-buildimage that referenced this pull request Oct 25, 2021
* Add DHCPv6 minigraph parsing support

Co-authored-by: shlomibitton <[email protected]>

Logrotate for wtmp and btmp files to fix size getting too large. (sonic-net#8744)

Signed-off-by: Abhishek Dosi <[email protected]>

[201811][utilities][swss][snmpagent] advance sub module head

snmpagent
* 187aa10 2021-09-16 | [201811][RFC1213]: Initialize lag oid map in reinit_data (sonic-net#233) (github/201811) [SuvarnaMeenakshi]

swss:
* 3503705 2021-09-05 | [201811][Cherry-pick] [acl mirror action] Mirror session ref count fix at acl rule attachment (sonic-net#1898) (HEAD -> 201811, github/201811) [bingwang-ms]

utilities:
* f3f8667 2021-10-15 | [201811] disk_check.py: Allow remote user access when disk is read-only (sonic-net#1873) (HEAD -> 201811, github/201811) [Renuka Manavalan]
* 6b351c9 2021-10-14 | [201811]  Remove exec from platform_reboot_plugin call to handle any hang issue. (sonic-net#1880) [Sujin Kang]
* d8d0461 2021-07-29 | [minigraph][port_config] Consume port_config.json while reloading minigraph (sonic-net#1726) [Blueve]

Signed-off-by: Ying Xie <[email protected]>

[201811] Invoke disk check periodically (sonic-net#8951)

* Invoke disk check periodically. (sonic-net#7374)

Why I did it
Helps with periodic scan of disk for RO state.
If found, this script makes transient fix and raise error message.

Save DB dump after warm/fast reboot (sonic-net#8913)

Back porting the master branch change - sonic-net#8803

Save the redis DB dump after warm reboot.

[201811][swss] advance swss submodule head (sonic-net#9049)

* e0b115a 2021-10-22 | [copp] add dhcpv6 copp rules (sonic-net#1979) (HEAD -> 201811, github/201811) [Ying Xie]

Signed-off-by: Ying Xie <[email protected]>

[swssconfig] load dhcpv6 copp rules by default (sonic-net#9047)

Why I did it
Need to enable DHCPv6 copp rule

How I did it
Add a separate DHCPv6 copp rule config file and load it during cold reboot.

How to verify it
cold reboot, and verify config being loaded and dhcpv6 rules got installed.

Signed-off-by: Ying Xie [email protected]

[warmboot finalizer] load dhcpv6 copp rules when missing (sonic-net#9048)

Why I did it
Need to enable DHCPv6 COPP rules.

How I did it
Load the separate DHCPv6 COPP rules after warm reboot if the rules are missing.

How to verify it
Warm reboot from an image doesn't have DHCPv6 COPP rules installed.
Warm reboot from an image have DHCPv6 COPP rules already installed.
In either case, the script did the right thing and only install the COPP rules if it is missing.

Signed-off-by: Ying Xie [email protected]
yxieca pushed a commit that referenced this pull request Oct 26, 2021
* Add DHCPv6 minigraph parsing support

Co-authored-by: shlomibitton <[email protected]>

Logrotate for wtmp and btmp files to fix size getting too large. (#8744)

Signed-off-by: Abhishek Dosi <[email protected]>

[201811][utilities][swss][snmpagent] advance sub module head

snmpagent
* 187aa10 2021-09-16 | [201811][RFC1213]: Initialize lag oid map in reinit_data (#233) (github/201811) [SuvarnaMeenakshi]

swss:
* 3503705 2021-09-05 | [201811][Cherry-pick] [acl mirror action] Mirror session ref count fix at acl rule attachment (#1898) (HEAD -> 201811, github/201811) [bingwang-ms]

utilities:
* f3f8667 2021-10-15 | [201811] disk_check.py: Allow remote user access when disk is read-only (#1873) (HEAD -> 201811, github/201811) [Renuka Manavalan]
* 6b351c9 2021-10-14 | [201811]  Remove exec from platform_reboot_plugin call to handle any hang issue. (#1880) [Sujin Kang]
* d8d0461 2021-07-29 | [minigraph][port_config] Consume port_config.json while reloading minigraph (#1726) [Blueve]

Signed-off-by: Ying Xie <[email protected]>

[201811] Invoke disk check periodically (#8951)

* Invoke disk check periodically. (#7374)

Why I did it
Helps with periodic scan of disk for RO state.
If found, this script makes transient fix and raise error message.

Save DB dump after warm/fast reboot (#8913)

Back porting the master branch change - #8803

Save the redis DB dump after warm reboot.

[201811][swss] advance swss submodule head (#9049)

* e0b115a 2021-10-22 | [copp] add dhcpv6 copp rules (#1979) (HEAD -> 201811, github/201811) [Ying Xie]

Signed-off-by: Ying Xie <[email protected]>

[swssconfig] load dhcpv6 copp rules by default (#9047)

Why I did it
Need to enable DHCPv6 copp rule

How I did it
Add a separate DHCPv6 copp rule config file and load it during cold reboot.

How to verify it
cold reboot, and verify config being loaded and dhcpv6 rules got installed.

Signed-off-by: Ying Xie [email protected]

[warmboot finalizer] load dhcpv6 copp rules when missing (#9048)

Why I did it
Need to enable DHCPv6 COPP rules.

How I did it
Load the separate DHCPv6 COPP rules after warm reboot if the rules are missing.

How to verify it
Warm reboot from an image doesn't have DHCPv6 COPP rules installed.
Warm reboot from an image have DHCPv6 COPP rules already installed.
In either case, the script did the right thing and only install the COPP rules if it is missing.

Signed-off-by: Ying Xie [email protected]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants