Skip to content
This repository has been archived by the owner on Oct 5, 2018. It is now read-only.

gpu/drm: hisilicon: Correct 720P's pixel clock to fix 720p didn't work at some monitors #12

Merged
merged 1 commit into from
Mar 11, 2015

Conversation

xin3liang
Copy link

Signed-off-by: Xinliang Liu [email protected]

fboudra added a commit that referenced this pull request Mar 11, 2015
gpu/drm: hisilicon: Correct 720P's pixel clock to fix 720p didn't work at some monitors
@fboudra fboudra merged commit 8b22e2c into 96boards:hikey Mar 11, 2015
@xin3liang
Copy link
Author

Sorry ldts and koenkooi, maybe I reply all the above dounts to wrong mail list before, thus you can't see my replies. So i want to reply again here and describe more clear for this commit.
This commit intend to revert the 720p timing setting on pull request 12: "gpu/drm: hisilicon: Correct 720P's pixel clock to fix 720p didn't work at some monitors"
Fathi Boudra and me found that the 720p timing change in pull request 12 is not stable. sometimes it can display sometimes not. So i push this pull request 22 to revert the timing change.
BTW, To koenkooi, i have no idea about the CEA 681-E Format 4 , could you explain a bit more to me? Thanks.

@koenkooi
Copy link

CEA 681-E Format 4 is the official name for '720p' in the hdmi spec. If you format your signal to that the tv has to display that to pass the requirements for the 'HDMI' logo

@xin3liang
Copy link
Author

Yes, i think our 720P timing is the CEA 681-E format 4, except the pixel clock is different. our is 75000kHz and CEA 681-E Format 4 is 13468kHz. But due to our another unfix signal issue, we can't set 13468kHz yet.
/* 1280x720 @ 60 Hz, 45 kHz hsync, CEA 681-E Format 4 */
"hd720", 60, 1280, 720, 13468, 220, 110, 20, 5, 40, 5,

johnstultz-work pushed a commit that referenced this pull request Jan 5, 2016
commit ecf5fc6 upstream.

Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  #1 schedule at ffffffff815ab76e
  #2 schedule_timeout at ffffffff815ae5e5
  #3 io_schedule_timeout at ffffffff815aad6a
  #4 bit_wait_io at ffffffff815abfc6
  #5 __wait_on_bit at ffffffff815abda5
  #6 wait_on_page_bit at ffffffff8111fd4f
  #7 shrink_page_list at ffffffff81135445
  #8 shrink_inactive_list at ffffffff81135845
  #9 shrink_lruvec at ffffffff81135ead
 #10 shrink_zone at ffffffff811360c3
 #11 shrink_zones at ffffffff81136eff
 #12 do_try_to_free_pages at ffffffff8113712f
 #13 try_to_free_mem_cgroup_pages at ffffffff811372be
 #14 try_charge at ffffffff81189423
 #15 mem_cgroup_try_charge at ffffffff8118c6f5
 #16 __add_to_page_cache_locked at ffffffff8112137d
 #17 add_to_page_cache_lru at ffffffff81121618
 #18 pagecache_get_page at ffffffff8112170b
 #19 grow_dev_page at ffffffff811c8297
 #20 __getblk_slow at ffffffff811c91d6
 #21 __getblk_gfp at ffffffff811c92c1
 #22 ext4_ext_grow_indepth at ffffffff8124565c
 #23 ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 ext4_ext_insert_extent at ffffffff81246f09
 #25 ext4_ext_map_blocks at ffffffff8124a848
 #26 ext4_map_blocks at ffffffff8121a5b7
 #27 mpage_map_one_extent at ffffffff8121b1fa
 #28 mpage_map_and_submit_extent at ffffffff8121f07b
 #29 ext4_writepages at ffffffff8121f6d5
 #30 do_writepages at ffffffff8112c490
 #31 __filemap_fdatawrite_range at ffffffff81120199
 #32 filemap_flush at ffffffff8112041c
 #33 ext4_alloc_da_blocks at ffffffff81219da1
 #34 ext4_rename at ffffffff81229b91
 #35 ext4_rename2 at ffffffff81229e32
 #36 vfs_rename at ffffffff811a08a5
 #37 SYSC_renameat2 at ffffffff811a3ffc
 #38 sys_renameat2 at ffffffff811a408e
 #39 sys_rename at ffffffff8119e51e
 #40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f4 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

Cc: [email protected] # 3.9+
[[email protected]: corrected the control flow]
Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
Signed-off-by: Hugh Dickins <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
idlethread pushed a commit that referenced this pull request Apr 21, 2016
commit ec183d2 upstream.

Fixes segmentation fault using, for instance:

  (gdb) run record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls
  Starting program: /home/acme/bin/perf record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls
  Missing separate debuginfos, use: dnf debuginfo-install glibc-2.22-7.fc23.x86_64
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib64/libthread_db.so.1".

 Program received signal SIGSEGV, Segmentation fault.
  0 x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410
  (gdb) bt
  #0  0x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410
  #1  0x00000000004b9fc5 in add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
      at util/parse-events.c:433
  #2  0x00000000004ba334 in add_tracepoint_event (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
      at util/parse-events.c:498
  #3  0x00000000004bb699 in parse_events_add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys=0x19b1370 "sched", event=0x19a5d00 "sched_switch", err=0x0, head_config=0x0)
      at util/parse-events.c:936
  #4  0x00000000004f6eda in parse_events_parse (_data=0x7fffffffb8b0, scanner=0x19a49d0) at util/parse-events.y:391
  #5  0x00000000004bc8e5 in parse_events__scanner (str=0x663ff2 "sched:sched_switch", data=0x7fffffffb8b0, start_token=258) at util/parse-events.c:1361
  #6  0x00000000004bca57 in parse_events (evlist=0x19a5220, str=0x663ff2 "sched:sched_switch", err=0x0) at util/parse-events.c:1401
  #7  0x0000000000518d5f in perf_evlist__can_select_event (evlist=0x19a3b90, str=0x663ff2 "sched:sched_switch") at util/record.c:253
  #8  0x0000000000553c42 in intel_pt_track_switches (evlist=0x19a3b90) at arch/x86/util/intel-pt.c:364
  #9  0x00000000005549d1 in intel_pt_recording_options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at arch/x86/util/intel-pt.c:664
  #10 0x000000000051e076 in auxtrace_record__options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at util/auxtrace.c:539
  #11 0x0000000000433368 in cmd_record (argc=1, argv=0x7fffffffde60, prefix=0x0) at builtin-record.c:1264
  #12 0x000000000049bec2 in run_builtin (p=0x8fa2a8 <commands+168>, argc=5, argv=0x7fffffffde60) at perf.c:390
  #13 0x000000000049c12a in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:451
  #14 0x000000000049c278 in run_argv (argcp=0x7fffffffdcbc, argv=0x7fffffffdcb0) at perf.c:495
  #15 0x000000000049c60a in main (argc=5, argv=0x7fffffffde60) at perf.c:618
(gdb)

Intel PT attempts to find the sched:sched_switch tracepoint but that seg
faults if tracefs is not readable, because the error reporting structure
is null, as errors are not reported when automatically adding
tracepoints.  Fix by checking before using.

Committer note:

This doesn't take place in a kernel that supports
perf_event_attr.context_switch, that is the default way that will be
used for tracking context switches, only in older kernels, like 4.2, in
a machine with Intel PT (e.g. Broadwell) for non-priviledged users.

Further info from a similar patch by Wang:

The error is in tracepoint_error: it assumes the 'e' parameter is valid.

However, there are many situation a parse_event() can be called without
parse_events_error. See result of

  $ grep 'parse_events(.*NULL)' ./tools/perf/ -r'

Signed-off-by: Adrian Hunter <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Tong Zhang <[email protected]>
Cc: Wang Nan <[email protected]>
Fixes: 1965817 ("perf tools: Enhance parsing events tracepoint error output")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
idlethread pushed a commit that referenced this pull request May 9, 2016
…CKING

The log is as blow:

=================================
[ INFO: inconsistent lock state ]
4.4.8+ #12 Not tainted
---------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
kworker/u64:1/168 [HC0[0]:SC0[0]:HE1:SE1] takes:
 (&(&hisi_hba->lock)->rlock){?.....}, at: [<ffffffc00052c708>] alloc_dev_quirk_v2_hw+0x48/0xec
{IN-HARDIRQ-W} state was registered at:
  [<ffffffc0000fc764>] mark_lock+0x19c/0x6a0
  [<ffffffc0000fdc14>] __lock_acquire+0xa2c/0x1d00
  [<ffffffc0000ff654>] lock_acquire+0x58/0x7c
  [<ffffffc0008b609c>] _raw_spin_lock_irqsave+0x54/0x6c
  [<ffffffc00052d3c0>] int_chnl_int_v2_hw+0x1c4/0x248
  [<ffffffc0001098e8>] handle_irq_event_percpu+0x9c/0x144
  [<ffffffc0001099d4>] handle_irq_event+0x44/0x74
  [<ffffffc00010cd68>] handle_fasteoi_irq+0xb4/0x188
  [<ffffffc000108ea8>] generic_handle_irq+0x24/0x38
  [<ffffffc0001091fc>] __handle_domain_irq+0x60/0xac
  [<ffffffc00008261c>] gic_handle_irq+0xcc/0x168
  [<ffffffc0000855ac>] el1_irq+0x6c/0xe0
  [<ffffffc0000f7414>] default_idle_call+0x1c/0x34
  [<ffffffc0000f7654>] cpu_startup_entry+0x1d4/0x228
  [<ffffffc0008aecd8>] rest_init+0x150/0x160
  [<ffffffc000c4b95c>] start_kernel+0x3a4/0x3b8
  [<00000000008bb000>] 0x8bb000
irq event stamp: 32661
hardirqs last  enabled at (32661): [<ffffffc0008b41a8>] __mutex_unlock_slowpath+0x108/0x18c
hardirqs last disabled at (32660): [<ffffffc0008b40e4>] __mutex_unlock_slowpath+0x44/0x18c
softirqs last  enabled at (25114): [<ffffffc0000bde68>] __do_softirq+0x210/0x27c
softirqs last disabled at (25095): [<ffffffc0000be224>] irq_exit+0x9c/0xe8

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&hisi_hba->lock)->rlock);
  <Interrupt>
    lock(&(&hisi_hba->lock)->rlock);

 *** DEADLOCK ***

2 locks held by kworker/u64:1/168:
 #0:  ("%s"shost->work_q_name){++++.+}, at: [<ffffffc0000d2980>] process_one_work+0x134/0x3cc
 #1:  ((&sw->work)#2){+.+.+.}, at: [<ffffffc0000d2980>] process_one_work+0x134/0x3cc

stack backtrace:
CPU: 4 PID: 168 Comm: kworker/u64:1 Not tainted 4.4.8+ #12
Hardware name: Huawei Technologies Co., Ltd. D03/D03, BIOS 1.12 01/01/1900
Workqueue: scsi_wq_1 sas_discover_domain
Call trace:
[<ffffffc000089988>] dump_backtrace+0x0/0x114
[<ffffffc000089ab0>] show_stack+0x14/0x1c
[<ffffffc00035ac50>] dump_stack+0xb4/0xf0
[<ffffffc0000fc524>] print_usage_bug+0x210/0x2b4
[<ffffffc0000fcbc4>] mark_lock+0x5fc/0x6a0
[<ffffffc0000fd9e8>] __lock_acquire+0x800/0x1d00
[<ffffffc0000ff654>] lock_acquire+0x58/0x7c
[<ffffffc0008b5edc>] _raw_spin_lock+0x44/0x58
[<ffffffc00052c708>] alloc_dev_quirk_v2_hw+0x48/0xec
[<ffffffc000528214>] hisi_sas_dev_found+0x48/0x1b8
[<ffffffc00051a9b8>] sas_notify_lldd_dev_found+0x34/0xe0
[<ffffffc00051e5e8>] sas_discover_root_expander+0x58/0x128
[<ffffffc00051b38c>] sas_discover_domain+0x4bc/0x564
[<ffffffc0000d29ec>] process_one_work+0x1a0/0x3cc
[<ffffffc0000d2d50>] worker_thread+0x138/0x438
[<ffffffc0000d9494>] kthread+0xdc/0xf0
[<ffffffc000085c50>] ret_from_fork+0x10/0x40

Signed-off-by: Wei Xu <[email protected]>
Reviewed-by: John Garry <[email protected]>
nyadla-sys pushed a commit to nyadla-sys/linux that referenced this pull request Apr 24, 2017
[ Upstream commit 45caeaa ]

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 96boards#8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 96boards#9 [] tcp_rcv_established at ffffffff81580b64
96boards#10 [] tcp_v4_do_rcv at ffffffff8158b54a
96boards#11 [] tcp_v4_rcv at ffffffff8158cd02
96boards#12 [] ip_local_deliver_finish at ffffffff815668f4
96boards#13 [] ip_local_deliver at ffffffff81566bd9
96boards#14 [] ip_rcv_finish at ffffffff8156656d
96boards#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
96boards#21 [] net_rx_action at ffffffff8152bac2
96boards#22 [] __do_softirq at ffffffff81084b4f
96boards#23 [] call_softirq at ffffffff8164845c
96boards#24 [] do_softirq at ffffffff81016fc5
96boards#25 [] irq_exit at ffffffff81084ee5
96boards#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <[email protected]>
Cc: Hannes Sowa <[email protected]>
Signed-off-by: Jon Maxwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
nyadla-sys pushed a commit to nyadla-sys/linux that referenced this pull request Apr 24, 2017
commit 4dfce57 upstream.

There have been several reports over the years of NULL pointer
dereferences in xfs_trans_log_inode during xfs_fsr processes,
when the process is doing an fput and tearing down extents
on the temporary inode, something like:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
PID: 29439  TASK: ffff880550584fa0  CPU: 6   COMMAND: "xfs_fsr"
    [exception RIP: xfs_trans_log_inode+0x10]
 96boards#9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs]
96boards#10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs]
96boards#11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs]
96boards#12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs]
96boards#13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs]
96boards#14 [ffff8800a57bbe00] evict at ffffffff811e1b67
96boards#15 [ffff8800a57bbe28] iput at ffffffff811e23a5
#16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8
#17 [ffff8800a57bbe88] dput at ffffffff811dd06c
#18 [ffff8800a57bbea8] __fput at ffffffff811c823b
#19 [ffff8800a57bbef0] ____fput at ffffffff811c846e
#20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27
96boards#21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c
96boards#22 [ffff8800a57bbf50] int_signal at ffffffff8161405d

As it turns out, this is because the i_itemp pointer, along
with the d_ops pointer, has been overwritten with zeros
when we tear down the extents during truncate.  When the in-core
inode fork on the temporary inode used by xfs_fsr was originally
set up during the extent swap, we mistakenly looked at di_nextents
to determine whether all extents fit inline, but this misses extents
generated by speculative preallocation; we should be using if_bytes
instead.

This mistake corrupts the in-memory inode, and code in
xfs_iext_remove_inline eventually gets bad inputs, causing
it to memmove and memset incorrect ranges; this became apparent
because the two values in ifp->if_u2.if_inline_ext[1] contained
what should have been in d_ops and i_itemp; they were memmoved due
to incorrect array indexing and then the original locations
were zeroed with memset, again due to an array overrun.

Fix this by properly using i_df.if_bytes to determine the number
of extents, not di_nextents.

Thanks to dchinner for looking at this with me and spotting the
root cause.

[nborisov: backported to 4.4]

Cc: [email protected]
Signed-off-by: Eric Sandeen <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
--
 fs/xfs/xfs_bmap_util.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants