gpu/drm: hisilicon: Correct 720P's pixel clock to fix 720p didn't work at some monitors #12

xin3liang · 2015-03-11T03:41:15Z

Signed-off-by: Xinliang Liu [email protected]

Signed-off-by: Xinliang Liu <[email protected]>

gpu/drm: hisilicon: Correct 720P's pixel clock to fix 720p didn't work at some monitors

xin3liang · 2015-03-19T03:33:18Z

Sorry ldts and koenkooi, maybe I reply all the above dounts to wrong mail list before, thus you can't see my replies. So i want to reply again here and describe more clear for this commit.
This commit intend to revert the 720p timing setting on pull request 12: "gpu/drm: hisilicon: Correct 720P's pixel clock to fix 720p didn't work at some monitors"
Fathi Boudra and me found that the 720p timing change in pull request 12 is not stable. sometimes it can display sometimes not. So i push this pull request 22 to revert the timing change.
BTW, To koenkooi, i have no idea about the CEA 681-E Format 4 , could you explain a bit more to me? Thanks.

koenkooi · 2015-03-19T06:13:54Z

CEA 681-E Format 4 is the official name for '720p' in the hdmi spec. If you format your signal to that the tv has to display that to pass the requirements for the 'HDMI' logo

xin3liang · 2015-03-20T03:02:07Z

Yes, i think our 720P timing is the CEA 681-E format 4, except the pixel clock is different. our is 75000kHz and CEA 681-E Format 4 is 13468kHz. But due to our another unfix signal issue, we can't set 13468kHz yet.
/* 1280x720 @ 60 Hz, 45 kHz hsync, CEA 681-E Format 4 */
"hd720", 60, 1280, 720, 13468, 220, 110, 20, 5, 40, 5,

commit ecf5fc6 upstream. Nikolay has reported a hang when a memcg reclaim got stuck with the following backtrace: PID: 18308 TASK: ffff883d7c9b0a30 CPU: 1 COMMAND: "rsync" #0 __schedule at ffffffff815ab152 #1 schedule at ffffffff815ab76e #2 schedule_timeout at ffffffff815ae5e5 #3 io_schedule_timeout at ffffffff815aad6a #4 bit_wait_io at ffffffff815abfc6 #5 __wait_on_bit at ffffffff815abda5 #6 wait_on_page_bit at ffffffff8111fd4f #7 shrink_page_list at ffffffff81135445 #8 shrink_inactive_list at ffffffff81135845 #9 shrink_lruvec at ffffffff81135ead #10 shrink_zone at ffffffff811360c3 #11 shrink_zones at ffffffff81136eff #12 do_try_to_free_pages at ffffffff8113712f #13 try_to_free_mem_cgroup_pages at ffffffff811372be #14 try_charge at ffffffff81189423 #15 mem_cgroup_try_charge at ffffffff8118c6f5 #16 __add_to_page_cache_locked at ffffffff8112137d #17 add_to_page_cache_lru at ffffffff81121618 #18 pagecache_get_page at ffffffff8112170b #19 grow_dev_page at ffffffff811c8297 #20 __getblk_slow at ffffffff811c91d6 #21 __getblk_gfp at ffffffff811c92c1 #22 ext4_ext_grow_indepth at ffffffff8124565c #23 ext4_ext_create_new_leaf at ffffffff81246ca8 #24 ext4_ext_insert_extent at ffffffff81246f09 #25 ext4_ext_map_blocks at ffffffff8124a848 #26 ext4_map_blocks at ffffffff8121a5b7 #27 mpage_map_one_extent at ffffffff8121b1fa #28 mpage_map_and_submit_extent at ffffffff8121f07b #29 ext4_writepages at ffffffff8121f6d5 #30 do_writepages at ffffffff8112c490 #31 __filemap_fdatawrite_range at ffffffff81120199 #32 filemap_flush at ffffffff8112041c #33 ext4_alloc_da_blocks at ffffffff81219da1 #34 ext4_rename at ffffffff81229b91 #35 ext4_rename2 at ffffffff81229e32 #36 vfs_rename at ffffffff811a08a5 #37 SYSC_renameat2 at ffffffff811a3ffc #38 sys_renameat2 at ffffffff811a408e #39 sys_rename at ffffffff8119e51e #40 system_call_fastpath at ffffffff815afa89 Dave Chinner has properly pointed out that this is a deadlock in the reclaim code because ext4 doesn't submit pages which are marked by PG_writeback right away. The heuristic was introduced by commit e62e384 ("memcg: prevent OOM with too many dirty pages") and it was applied only when may_enter_fs was specified. The code has been changed by c3b94f4 ("memcg: further prevent OOM with too many dirty pages") which has removed the __GFP_FS restriction with a reasoning that we do not get into the fs code. But this is not sufficient apparently because the fs doesn't necessarily submit pages marked PG_writeback for IO right away. ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily submit the bio. Instead it tries to map more pages into the bio and mpage_map_one_extent might trigger memcg charge which might end up waiting on a page which is marked PG_writeback but hasn't been submitted yet so we would end up waiting for something that never finishes. Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2) before we go to wait on the writeback. The page fault path, which is the only path that triggers memcg oom killer since 3.12, shouldn't require GFP_NOFS and so we shouldn't reintroduce the premature OOM killer issue which was originally addressed by the heuristic. As per David Chinner the xfs is doing similar thing since 2.6.15 already so ext4 is not the only affected filesystem. Moreover he notes: : For example: IO completion might require unwritten extent conversion : which executes filesystem transactions and GFP_NOFS allocations. The : writeback flag on the pages can not be cleared until unwritten : extent conversion completes. Hence memory reclaim cannot wait on : page writeback to complete in GFP_NOFS context because it is not : safe to do so, memcg reclaim or otherwise. Cc: [email protected] # 3.9+ [[email protected]: corrected the control flow] Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages") Reported-by: Nikolay Borisov <[email protected]> Signed-off-by: Michal Hocko <[email protected]> Signed-off-by: Hugh Dickins <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit ec183d2 upstream. Fixes segmentation fault using, for instance: (gdb) run record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls Starting program: /home/acme/bin/perf record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls Missing separate debuginfos, use: dnf debuginfo-install glibc-2.22-7.fc23.x86_64 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. 0 x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410 (gdb) bt #0 0x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410 #1 0x00000000004b9fc5 in add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:433 #2 0x00000000004ba334 in add_tracepoint_event (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:498 #3 0x00000000004bb699 in parse_events_add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys=0x19b1370 "sched", event=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:936 #4 0x00000000004f6eda in parse_events_parse (_data=0x7fffffffb8b0, scanner=0x19a49d0) at util/parse-events.y:391 #5 0x00000000004bc8e5 in parse_events__scanner (str=0x663ff2 "sched:sched_switch", data=0x7fffffffb8b0, start_token=258) at util/parse-events.c:1361 #6 0x00000000004bca57 in parse_events (evlist=0x19a5220, str=0x663ff2 "sched:sched_switch", err=0x0) at util/parse-events.c:1401 #7 0x0000000000518d5f in perf_evlist__can_select_event (evlist=0x19a3b90, str=0x663ff2 "sched:sched_switch") at util/record.c:253 #8 0x0000000000553c42 in intel_pt_track_switches (evlist=0x19a3b90) at arch/x86/util/intel-pt.c:364 #9 0x00000000005549d1 in intel_pt_recording_options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at arch/x86/util/intel-pt.c:664 #10 0x000000000051e076 in auxtrace_record__options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at util/auxtrace.c:539 #11 0x0000000000433368 in cmd_record (argc=1, argv=0x7fffffffde60, prefix=0x0) at builtin-record.c:1264 #12 0x000000000049bec2 in run_builtin (p=0x8fa2a8 <commands+168>, argc=5, argv=0x7fffffffde60) at perf.c:390 #13 0x000000000049c12a in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:451 #14 0x000000000049c278 in run_argv (argcp=0x7fffffffdcbc, argv=0x7fffffffdcb0) at perf.c:495 #15 0x000000000049c60a in main (argc=5, argv=0x7fffffffde60) at perf.c:618 (gdb) Intel PT attempts to find the sched:sched_switch tracepoint but that seg faults if tracefs is not readable, because the error reporting structure is null, as errors are not reported when automatically adding tracepoints. Fix by checking before using. Committer note: This doesn't take place in a kernel that supports perf_event_attr.context_switch, that is the default way that will be used for tracking context switches, only in older kernels, like 4.2, in a machine with Intel PT (e.g. Broadwell) for non-priviledged users. Further info from a similar patch by Wang: The error is in tracepoint_error: it assumes the 'e' parameter is valid. However, there are many situation a parse_event() can be called without parse_events_error. See result of $ grep 'parse_events(.*NULL)' ./tools/perf/ -r' Signed-off-by: Adrian Hunter <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Tong Zhang <[email protected]> Cc: Wang Nan <[email protected]> Fixes: 1965817 ("perf tools: Enhance parsing events tracepoint error output") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…CKING The log is as blow: ================================= [ INFO: inconsistent lock state ] 4.4.8+ #12 Not tainted --------------------------------- inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. kworker/u64:1/168 [HC0[0]:SC0[0]:HE1:SE1] takes: (&(&hisi_hba->lock)->rlock){?.....}, at: [<ffffffc00052c708>] alloc_dev_quirk_v2_hw+0x48/0xec {IN-HARDIRQ-W} state was registered at: [<ffffffc0000fc764>] mark_lock+0x19c/0x6a0 [<ffffffc0000fdc14>] __lock_acquire+0xa2c/0x1d00 [<ffffffc0000ff654>] lock_acquire+0x58/0x7c [<ffffffc0008b609c>] _raw_spin_lock_irqsave+0x54/0x6c [<ffffffc00052d3c0>] int_chnl_int_v2_hw+0x1c4/0x248 [<ffffffc0001098e8>] handle_irq_event_percpu+0x9c/0x144 [<ffffffc0001099d4>] handle_irq_event+0x44/0x74 [<ffffffc00010cd68>] handle_fasteoi_irq+0xb4/0x188 [<ffffffc000108ea8>] generic_handle_irq+0x24/0x38 [<ffffffc0001091fc>] __handle_domain_irq+0x60/0xac [<ffffffc00008261c>] gic_handle_irq+0xcc/0x168 [<ffffffc0000855ac>] el1_irq+0x6c/0xe0 [<ffffffc0000f7414>] default_idle_call+0x1c/0x34 [<ffffffc0000f7654>] cpu_startup_entry+0x1d4/0x228 [<ffffffc0008aecd8>] rest_init+0x150/0x160 [<ffffffc000c4b95c>] start_kernel+0x3a4/0x3b8 [<00000000008bb000>] 0x8bb000 irq event stamp: 32661 hardirqs last enabled at (32661): [<ffffffc0008b41a8>] __mutex_unlock_slowpath+0x108/0x18c hardirqs last disabled at (32660): [<ffffffc0008b40e4>] __mutex_unlock_slowpath+0x44/0x18c softirqs last enabled at (25114): [<ffffffc0000bde68>] __do_softirq+0x210/0x27c softirqs last disabled at (25095): [<ffffffc0000be224>] irq_exit+0x9c/0xe8 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&hisi_hba->lock)->rlock); <Interrupt> lock(&(&hisi_hba->lock)->rlock); *** DEADLOCK *** 2 locks held by kworker/u64:1/168: #0: ("%s"shost->work_q_name){++++.+}, at: [<ffffffc0000d2980>] process_one_work+0x134/0x3cc #1: ((&sw->work)#2){+.+.+.}, at: [<ffffffc0000d2980>] process_one_work+0x134/0x3cc stack backtrace: CPU: 4 PID: 168 Comm: kworker/u64:1 Not tainted 4.4.8+ #12 Hardware name: Huawei Technologies Co., Ltd. D03/D03, BIOS 1.12 01/01/1900 Workqueue: scsi_wq_1 sas_discover_domain Call trace: [<ffffffc000089988>] dump_backtrace+0x0/0x114 [<ffffffc000089ab0>] show_stack+0x14/0x1c [<ffffffc00035ac50>] dump_stack+0xb4/0xf0 [<ffffffc0000fc524>] print_usage_bug+0x210/0x2b4 [<ffffffc0000fcbc4>] mark_lock+0x5fc/0x6a0 [<ffffffc0000fd9e8>] __lock_acquire+0x800/0x1d00 [<ffffffc0000ff654>] lock_acquire+0x58/0x7c [<ffffffc0008b5edc>] _raw_spin_lock+0x44/0x58 [<ffffffc00052c708>] alloc_dev_quirk_v2_hw+0x48/0xec [<ffffffc000528214>] hisi_sas_dev_found+0x48/0x1b8 [<ffffffc00051a9b8>] sas_notify_lldd_dev_found+0x34/0xe0 [<ffffffc00051e5e8>] sas_discover_root_expander+0x58/0x128 [<ffffffc00051b38c>] sas_discover_domain+0x4bc/0x564 [<ffffffc0000d29ec>] process_one_work+0x1a0/0x3cc [<ffffffc0000d2d50>] worker_thread+0x138/0x438 [<ffffffc0000d9494>] kthread+0xdc/0xf0 [<ffffffc000085c50>] ret_from_fork+0x10/0x40 Signed-off-by: Wei Xu <[email protected]> Reviewed-by: John Garry <[email protected]>

[ Upstream commit 45caeaa ] As Eric Dumazet pointed out this also needs to be fixed in IPv6. v2: Contains the IPv6 tcp/Ipv6 dccp patches as well. We have seen a few incidents lately where a dst_enty has been freed with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that dst_entry. If the conditions/timings are right a crash then ensues when the freed dst_entry is referenced later on. A Common crashing back trace is: 96boards#8 [] page_fault at ffffffff8163e648 [exception RIP: __tcp_ack_snd_check+74] . . 96boards#9 [] tcp_rcv_established at ffffffff81580b64 96boards#10 [] tcp_v4_do_rcv at ffffffff8158b54a 96boards#11 [] tcp_v4_rcv at ffffffff8158cd02 96boards#12 [] ip_local_deliver_finish at ffffffff815668f4 96boards#13 [] ip_local_deliver at ffffffff81566bd9 96boards#14 [] ip_rcv_finish at ffffffff8156656d 96boards#15 [] ip_rcv at ffffffff81566f06 #16 [] __netif_receive_skb_core at ffffffff8152b3a2 #17 [] __netif_receive_skb at ffffffff8152b608 #18 [] netif_receive_skb at ffffffff8152b690 #19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3] #20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3] 96boards#21 [] net_rx_action at ffffffff8152bac2 96boards#22 [] __do_softirq at ffffffff81084b4f 96boards#23 [] call_softirq at ffffffff8164845c 96boards#24 [] do_softirq at ffffffff81016fc5 96boards#25 [] irq_exit at ffffffff81084ee5 96boards#26 [] do_IRQ at ffffffff81648ff8 Of course it may happen with other NIC drivers as well. It's found the freed dst_entry here: 224 static bool tcp_in_quickack_mode(struct sock *sk)↩ 225 {↩ 226 ▹ const struct inet_connection_sock *icsk = inet_csk(sk);↩ 227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩ 228 ↩ 229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩ 230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩ 231 }↩ But there are other backtraces attributed to the same freed dst_entry in netfilter code as well. All the vmcores showed 2 significant clues: - Remote hosts behind the default gateway had always been redirected to a different gateway. A rtable/dst_entry will be added for that host. Making more dst_entrys with lower reference counts. Making this more probable. - All vmcores showed a postitive LockDroppedIcmps value, e.g: LockDroppedIcmps 267 A closer look at the tcp_v4_err() handler revealed that do_redirect() will run regardless of whether user space has the socket locked. This can result in a race condition where the same dst_entry cached in sk->sk_dst_entry can be decremented twice for the same socket via: do_redirect()->__sk_dst_check()-> dst_release(). Which leads to the dst_entry being prematurely freed with another socket pointing to it via sk->sk_dst_cache and a subsequent crash. To fix this skip do_redirect() if usespace has the socket locked. Instead let the redirect take place later when user space does not have the socket locked. The dccp/IPv6 code is very similar in this respect, so fixing it there too. As Eric Garver pointed out the following commit now invalidates routes. Which can set the dst->obsolete flag so that ipv4_dst_check() returns null and triggers the dst_release(). Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver <[email protected]> Cc: Hannes Sowa <[email protected]> Signed-off-by: Jon Maxwell <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 4dfce57 upstream. There have been several reports over the years of NULL pointer dereferences in xfs_trans_log_inode during xfs_fsr processes, when the process is doing an fput and tearing down extents on the temporary inode, something like: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 PID: 29439 TASK: ffff880550584fa0 CPU: 6 COMMAND: "xfs_fsr" [exception RIP: xfs_trans_log_inode+0x10] 96boards#9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs] 96boards#10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs] 96boards#11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs] 96boards#12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs] 96boards#13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs] 96boards#14 [ffff8800a57bbe00] evict at ffffffff811e1b67 96boards#15 [ffff8800a57bbe28] iput at ffffffff811e23a5 #16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8 #17 [ffff8800a57bbe88] dput at ffffffff811dd06c #18 [ffff8800a57bbea8] __fput at ffffffff811c823b #19 [ffff8800a57bbef0] ____fput at ffffffff811c846e #20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27 96boards#21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c 96boards#22 [ffff8800a57bbf50] int_signal at ffffffff8161405d As it turns out, this is because the i_itemp pointer, along with the d_ops pointer, has been overwritten with zeros when we tear down the extents during truncate. When the in-core inode fork on the temporary inode used by xfs_fsr was originally set up during the extent swap, we mistakenly looked at di_nextents to determine whether all extents fit inline, but this misses extents generated by speculative preallocation; we should be using if_bytes instead. This mistake corrupts the in-memory inode, and code in xfs_iext_remove_inline eventually gets bad inputs, causing it to memmove and memset incorrect ranges; this became apparent because the two values in ifp->if_u2.if_inline_ext[1] contained what should have been in d_ops and i_itemp; they were memmoved due to incorrect array indexing and then the original locations were zeroed with memset, again due to an array overrun. Fix this by properly using i_df.if_bytes to determine the number of extents, not di_nextents. Thanks to dchinner for looking at this with me and spotting the root cause. [nborisov: backported to 4.4] Cc: [email protected] Signed-off-by: Eric Sandeen <[email protected]> Reviewed-by: Brian Foster <[email protected]> Signed-off-by: Dave Chinner <[email protected]> Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> -- fs/xfs/xfs_bmap_util.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)

gpu/drm: hisilicon: Correct 720P's pixel clock

72b15e5

Signed-off-by: Xinliang Liu <[email protected]>

fboudra added a commit that referenced this pull request Mar 11, 2015

Merge pull request #12 from xin3liang/hikey-drm-hdmi-upload

8b22e2c

gpu/drm: hisilicon: Correct 720P's pixel clock to fix 720p didn't work at some monitors

fboudra merged commit 8b22e2c into 96boards:hikey Mar 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu/drm: hisilicon: Correct 720P's pixel clock to fix 720p didn't work at some monitors #12

gpu/drm: hisilicon: Correct 720P's pixel clock to fix 720p didn't work at some monitors #12

xin3liang commented Mar 11, 2015

xin3liang commented Mar 19, 2015

koenkooi commented Mar 19, 2015

xin3liang commented Mar 20, 2015

gpu/drm: hisilicon: Correct 720P's pixel clock to fix 720p didn't work at some monitors #12

gpu/drm: hisilicon: Correct 720P's pixel clock to fix 720p didn't work at some monitors #12

Conversation

xin3liang commented Mar 11, 2015

xin3liang commented Mar 19, 2015

koenkooi commented Mar 19, 2015

xin3liang commented Mar 20, 2015