Skip to content

QunBB/WBDC2021

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

2021微信大数据挑战赛:https://algo.weixin.qq.com/

详细介绍见:https://zhuanlan.zhihu.com/p/399218898

环境依赖

Python: 3.6

tensorflow-gpu==1.15  # GPU version of TensorFlow
sentencepiece
gensim==3.8.3
pandas
PyYAML
tqdm
matplotlib
sklearn
recordclass
numba

目录结构

./
├── README.md
├── requirements.txt, python package requirements 
├── init.sh, script for installing package requirements 
├── train.sh, script for preparing train/inference data and training models, including pretrained models 
├── inference.sh, script for inference 
├── src
│   ├── prepare, codes for preparing train/test dataset
|   ├── train, codes for training
|   ├── inference.py, main function for inference on test dataset
│   ├── model, codes for model architecture
├── data
│   ├── wedata, dataset of the competition
│       ├── wechat_algo_data1, preliminary dataset (初赛数据集) 
│       ├── wechat_algo_data2, semi-final dataset (复赛数据集)
│   ├── submission, prediction result after running inference.sh
│   ├── model, model files (e.g. tensorflow checkpoints) 
│   ├── preprocess, 预处理的数据
│   ├── deepwalk, deepwalk算法的数据
│   ├── match_tower, Match Tower模型的训练样本
├── config, (optional) configuration files for your method (e.g. yaml file)

运行流程

chmod u+x init.sh
chmod u+x inference.sh

./init.sh

./inference.sh

模型及特征

ID类特征:用户ID、device、feedid、authorid、videoplayseconds、description、bgm_song_id、bgm_singer_id、manual_keyword_list、manual_tag_list;

统计特征:用户、feed、author分别统计前一天、前n天的总数、各label的总数,以及均值、标准差

  1. Match Tower模型:接口ID类特征,多层MLP
  2. Albert:历史序列经过albert模型产生一个logits,以及序列embedding;
  3. PLE模型:输入为ID类特征及统计特征及albert产生的序列embediing,输出7个对应标签的logits,然后与Match Tower和albert产生的logits进行融合,最后输出7个label的预测概率。

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published