首页

首页 > 通知公告 > 正文

【预告】医疗大数据统计建模创新团队举办第1期工作坊系列讲座

时间：2019-11-26

时间：2019年11月28日地点：1号教学楼1502室（统计学院资料室）
14:00-14:10	（1）开幕式，李国锋院长致辞（2）团队负责人说明本次工作坊的议题及要求主持人：何勇
14:10-14:40	【1】Simultaneous estimation and group pattern recovery for censored liner regression 主持人：何勇，山东财经大学演讲人：严晓东，山东大学
14:40-15:10	【2】A Machine Learning Method for Identifying Critical Interactions between Gene Pairs in Alzheimer's Disease Prediction 主持人：裴有权，山东大学演讲人：季加东，山东财经大学
15:10-15:40	【3】Survival mixture model for credit risk analysis 主持人：于渊，山东财经大学演讲人：裴有权，山东大学
15:40-15:50	休息
15:50-16:20	【4】基于稳健似然比的过程控制图研究主持人；严晓东，山东大学演讲人：庄芳，山东财经大学
16:20-16:50	【5】基于基因网络的多水平贝叶斯变量选择模型在全基因组关联研究中的应用主持人：裴有权，山东大学演讲人：张霄帅，山东财经大学
16:50-17:20	【6】Concentration Inequalities and Chaining Arguments 主持人；季加东，山东财经大学演讲人：王汉超，山东大学
17:20-17:30	团队负责人何勇总结，并做下一步计划安排

报告摘要

题目：A Machine Learning Method for Identifying Critical Interactions between Gene Pairs in Alzheimer's Disease Prediction

报告人：季加东

摘要：Background: Alzheimer’s disease (AD) is the most common type of dementia. Scientists have discovered that the causes of AD may include a combination of genetic, lifestyle, and environmental factors, but the exact cause has not yet been elucidated. Hence, effective strategies to prevent and treat AD remain elusive. The identified genetic causes of AD mainly focus on individual genes, but growing evidence has shown that complex diseases are usually affected by the interaction of genes in a network. Few studies have focused on the interactions and correlations between genes and how they are gradually destroyed or disappeared during AD progression. A differential network analysis has been recognized as an essential tool for identifying the underlying pathogenic mechanisms and significant genes for prediction analysis. Hence, we aim to conduct a differential network analysis to reveal potential networks involved in the neuropathogenesis of AD and identify genes for AD prediction. Methods: In this paper, we selected 365 samples from Religious Orders Study and the Rush Memory and Aging Project, including 193 clinically and neuropathologically confirmed AD subjects and 172 no cognitive impairment (NCI) controls. Then, we selected 158 genes belonging to the AD pathway (hsa05010) of Kyoto Encyclopedia of Genes and Genomes. We employed a machine learning method, namely, joint density-based non-parametric differential interaction network analysis and classification (JDINAC), in the analysis of gene expression data (RNA-seq data). We searched for the differential networks in the RNA-seq data with a pathological diagnosis of AD. Finally, an optimal prediction model was subsequently built through cross-validation, which showed good discrimination and calibration for AD prediction. Results: We used JDINAC to derive a gene co-expression network and explore the relationship between the interaction of gene pairs and AD, and the top 10 differential gene pairs were identified. Then, we compared the prediction performance between JDINAC and individual genes based on prediction methods. JDINAC provides better accuracy of classification than the latest methods, such as random forest and penalized logistic regression.

报告人简介：季加东，山东财经大学统计学院副教授。研究方向为生物医学统计方法研究、高维数据分析、图模型等，目前主持国家自科青年基金一项，山东省基金一项，发表论文20余篇。

题目：Survival mixture model for credit risk analysis

报告人：裴有权

摘要：Credit risk plays a vital role in the field of economics and finance. Survival analysis provides an advanced solution to the credit risk problem due to it can not only model whether to default, but also predict the probability of time when default occurs. In order to deal with heterogeneous personal loan data, we propose a survival mixture model to cluster the customers and predict the probability that somebody will experience default. In particular, we propose a new unsupervised learning algorithm to fit Cox proportional hazards mixture model with unknown number of components. The developed approach consists in a penalized maximum likelihood estimation carried out by a modified EM algorithm and can simultaneously estimate the model parameters and the number of mixing components from the data. Numerical experiments on simulated and real data show that the proposed method performs well and provides accurate clustering results.

报告人简介：裴有权，山东大学经济学院副研究员，入选山东大学第五批青年学者未来计划；上海财经大学统计学博士，香港浸会大学博士后。长期致力于计量经济以及信用风险管理的研究，研究成果发表在在计量经济学以及统计学权威期刊Journal of Econometrics、Science China Mathematics以及Metrika上。目前主持国家自然科学青年基金一项，主要研究方向为计量经济、有限混合模型以及信用风险管理等。

题目：Concentration Inequalities and Chaining Arguments

报告人：王汉超

摘要：There have been a lot of research activities around phenomena of measure concentration in the past decades. In this talk., I will present my recent work on concentration inequalities, especially some uniform inequalities, which are obtained by chaining arguments.

报告人简介：王汉超，山东大学中泰金融研究院副教授。浙江大学数学系理学博士学位，2011年-2013年，浙江大学物理系博士后研究员，2013年10月-2015年11月，香港中文大学统计系博士后研究员。先后应邀访问过悉尼大学，新加坡国立大学，莫斯科大学，俄罗斯斯捷克洛夫数学研究所。与林正炎教授合作出版学术专著《Weak convergence and its applications》,先后在Econometric theory、Scandinavian Journal of Statistics, Journal of theoretical probability, Journal of applied probability等杂志发表论文十余篇。

题目：Simultaneous estimation and group pattern recovery for censored liner regression

报告人：严晓东

摘要：In the presence of treatment heterogeneity due to unknown grouping information, standard methods assuming homogeneous treatment effects cannot capture the subgroup structures in the population. To accommodate heterogeneity, we propose a concave fusion approach for identifying subgroup structures as well as estimating treatment effects for semiparametric linear regression with censored data. In particular, the treatment effects are subject-dependent and subgroup-specific, and our concave fusion penalized method conducts the subgroup analysis without the need to know the group membership of the individuals in advance. The proposed estimation procedure can automatically identify the subgroup structure and simultaneously estimate the subgroup-specific treatment effects. Our new algorithm proceeds through combining the Buckley–James iterative procedure and the alternating direction method of multipliers, and the resulting estimators enjoy the oracle property. Simulation studies demonstrate the good performance of the new method with finite samples subject to right censoring, and a real data example is provided for illustration.

报告人简介：严晓东，香港理工大学博士后，云南大学统计学博士，香港中文大学研究助理，山东大学经济学院副教授，山东大学未来学者，中国现场统计研究会高维数据统计分会理事，山东省应用统计学会副秘书长。在经济学领域国际顶级期刊Journal of Econometrics和统计学领域国际顶级期刊The Annals of Statistics, Journal of the American Statistical Association, 以及统计学一流期刊Statistica Sinica，Journal of Multivariate Analysis，Computational Statistics & Data Analysis等发表论文十余篇。目前主持国家自然科学基金，山东省自然科学基金、山东省社科规划项目基金、山东省青年学者未来计划基金。

题目：基于基因网络的多水平贝叶斯变量选择模型在全基因组关联研究中的应用

报告人：张霄帅

摘要：近年来，全基因组关联分析（GWAS）方法发现并鉴定了许多与人类性状或复杂疾病关联的遗传变异。大部分GWAS研究仍然采用传统的以单个SNP为标记，与疾病进行关联分析，然而以单一SNP作为研究对象的GWAS只能解释一小部分复杂疾病的遗传特征，即造成“丢失的遗传性（missing heritability）”这一现象。因此从多位点模型角度出发，综合考虑多遗传标记对疾病的联合作用，才能提高统计效能，为进一步发掘人类复杂疾病发生的遗传特征提供线索。为此本研究建立了贝叶斯多水平模型，从SNP和基因两个层面同时进行变量选择，并在变量选择中将马尔科夫随机场设为先验以充分利用基因调控网络信息。设计统计模拟方案，对三种方法识别致病位点的能力进行比较评价。以病例对照研究的汉族人群麻风病全基因组数据为例，对三种方法进行实例验证。统计模拟结果显示，本文提出的贝叶斯变量选择方法在识别致病位点方面优于LASSO回归和逐步回归法。本文提出的贝叶斯变量选择模型要优于LASSO回归和逐步回归方法，可作为一种新型分析方法应用于全基因组关联分析中。

报告人简介：张霄帅，山东财经大学副教授。研究方向为生物医学统计方法研究。目前主持国家自科青年基金一项，发表多篇生物统计SCI期刊论文。

题目：基于稳健似然比的过程控制图研究

报告人：庄芳

摘要：在统计过程控制中，传统控制图因其简单易操作优势在目前的生产过程中仍有着广泛的应用，但其在实际应用方面还存在一定的局限性。传统控制图多是在正态分布总体的前提下构造的，而实际生产过程数据的分布大多不满足正态条件。根据Neyman-Pearson引理可知，由单调似然比函数构造的简单假设检验是一致最优检验，但对于非正态分布总体却很难满足单调似然比的条件。在这种情况下，本文考虑通过截断的方式对非正态分布的对数似然比进行单调化处理，构造稳健似然比，进而对相应的控制图进行优化改进。

报告人简介：庄芳，山东财经大学讲师，2019年6月毕业于上海财经大学，师从吴纯杰教授，研究方向为统计过程控制与控制图。