2021微信大数据挑战赛：https://algo.weixin.qq.com/

详细介绍见：https://zhuanlan.zhihu.com/p/399218898

环境依赖

Python: 3.6

tensorflow-gpu==1.15  # GPU version of TensorFlow
sentencepiece
gensim==3.8.3
pandas
PyYAML
tqdm
matplotlib
sklearn
recordclass
numba

目录结构

./
├── README.md
├── requirements.txt, python package requirements 
├── init.sh, script for installing package requirements 
├── train.sh, script for preparing train/inference data and training models, including pretrained models 
├── inference.sh, script for inference 
├── src
│   ├── prepare, codes for preparing train/test dataset
|   ├── train, codes for training
|   ├── inference.py, main function for inference on test dataset
│   ├── model, codes for model architecture
├── data
│   ├── wedata, dataset of the competition
│       ├── wechat_algo_data1, preliminary dataset (初赛数据集) 
│       ├── wechat_algo_data2, semi-final dataset （复赛数据集）
│   ├── submission, prediction result after running inference.sh
│   ├── model, model files (e.g. tensorflow checkpoints) 
│   ├── preprocess, 预处理的数据
│   ├── deepwalk, deepwalk算法的数据
│   ├── match_tower, Match Tower模型的训练样本
├── config, (optional) configuration files for your method (e.g. yaml file)

运行流程

chmod u+x init.sh
chmod u+x inference.sh

./init.sh

./inference.sh

模型及特征

ID类特征：用户ID、device、feedid、authorid、videoplayseconds、description、bgm_song_id、bgm_singer_id、manual_keyword_list、manual_tag_list；

统计特征：用户、feed、author分别统计前一天、前n天的总数、各label的总数，以及均值、标准差

Match Tower模型：接口ID类特征，多层MLP
Albert：历史序列经过albert模型产生一个logits，以及序列embedding；
PLE模型：输入为ID类特征及统计特征及albert产生的序列embediing，输出7个对应标签的logits，然后与Match Tower和albert产生的logits进行融合，最后输出7个label的预测概率。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

2021微信大数据挑战赛：https://algo.weixin.qq.com/

环境依赖

目录结构

运行流程

模型及特征

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
src		src
README.md		README.md
inference.sh		inference.sh
init.sh		init.sh
requirements.txt		requirements.txt
train.sh		train.sh

QunBB/WBDC2021

Folders and files

Latest commit

History

Repository files navigation

2021微信大数据挑战赛：https://algo.weixin.qq.com/

环境依赖

目录结构

运行流程

模型及特征

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages