主题演化
动态主题模型(DTM)的算法优化与理论创新
该组文献聚焦于主题演化核心算法的开发与改进。研究涵盖了从离散到连续的时间建模(cDTM)、非参数化模型(HDP)、非负矩阵分解(NMF)、在线学习(Online LDA)以及结合词向量(Word2Vec)的语义增强。其核心目标是解决大规模文本流下的计算效率、主题稀疏性以及长短期依赖建模问题。
- Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream(Amr Ahmed, Eric P. Xing, 2012, arXiv (Cornell University))
- Learning Methods for Dynamic Topic Modeling in Automated Behavior Analysis(Olga Isupova, Danil Kuzin, Lyudmila Mihaylova, 2017, IEEE Transactions on Neural Networks and Learning Systems)
- Time-Varying Dynamic Topic Model(Jun Han, Yu Huang, Kuldeep Kumar, Sukanto Bhattacharya, 2017, Journal of Global Information Management)
- Dependent Hierarchical Normalized Random Measures for Dynamic Topic Modeling(Changyou Chen, Nan Ding, Wray Buntine, 2012, arXiv (Cornell University))
- Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach(Derek Greene, James Cross, 2017, Political Analysis)
- Partition-then-Overlap Method for Labeling Cyber Threat Intelligence Reports by Topics over Time(Ryusei Nagasawa, Keisuke Furumoto, Makoto Takita, Yoshiaki Shiraishi, Takeshi Takahashi, Masami Mohri, Yasuhiro Takano, Masakatu Morii, 2021, IEICE Transactions on Information and Systems)
- An integrated latent Dirichlet allocation and Word2vec method for generating the topic evolution of mental models from global to local(Jian Ma, Lei Wang, Yuan-Rong Zhang, Wei Yuan, Wei Guo, 2022, Expert Systems with Applications)
- 基于改进LDA模型的主题识别及演化研究——以软件开源领域为例(高翔菲, 董平军, 2025, 数据挖掘)
- Topic Evolution in a Stream of Documents(André Gohr, Alexander Hinneburg, René Schult, Myra Spiliopoulou, 2009, No journal)
- Online multiscale dynamic topic models(Tomoharu Iwata, Takeshi Yamada, Yasushi Sakurai, Naonori Ueda, 2010, No journal)
- Scalable Generalized Dynamic Topic Models(Patrick Jähnichen, Florian Wenzel, Marius Kloft, Stephan Mandt, 2018, arXiv (Cornell University))
- A nonparametric mixture model for topic modeling over time(Avinava Dubey, Ahmed Hefny, Sinead A. Williamson, Eric P. Xing, 2013, No journal)
- On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking(Loulwah AlSumait, Daniel Barbará, Carlotta Domeniconi, 2008, No journal)
- Scaling up Dynamic Topic Models(Arnab Bhadury, Jianfei Chen, Jun Zhu, Shi‐Xia Liu, 2016, No journal)
- Dynamic topic modeling via self-aggregation for short text streams(Lei Shi, Junping Du, Meiyu Liang, Feifei Kou, 2018, Peer-to-Peer Networking and Applications)
- An alternative topic model based on Common Interest Authors for topic evolution analysis(Sukhwan Jung, Wan Chul Yoon, 2020, Journal of Informetrics)
- On Large-Scale Dynamic Topic Modeling with Nonnegative CP Tensor Decomposition(Miju Ahn, Nicole Eikmeier, Jamie Haddock, Lara Kassab, Alona Kryshchenko, Kathryn Leonard, Deanna Needell, R. W. M. A. Madushani, Elena Sizikova, Chuntian Wang, 2021, Association for Women in Mathematics series)
- Dynamic topic models(David M. Blei, John Lafferty, 2006, No journal)
- Continuous Time Dynamic Topic Models(Chong Wang, David M. Blei, David Heckerman, 2012, arXiv (Cornell University))
- Topics over Time: A NonMarkov ContinuousTime Model of Topical Trends(Xuerui Wang, Andrew McCallum, 2006, Scholarworks (University of Massachusetts Amherst))
- Topics over time(Xuerui Wang, Andrew McCallum, 2006, No journal)
- Modeling Topic Evolution in Social Media Short Texts(Yuhao Zhang, Wenji Mao, Junjie Lin, 2017, No journal)
- Topic Tracking with Dynamic Topic Model and Topic-based Weighting Method(Xiaoyan Zhang, Ting Wang, 2010, Journal of Software)
- CHANGE OF TOPICS OVER TIME - Tracking Topics by their Change of Meaning(Gerhard Heyer, Florian Holz, Sven Teresniak, 2009, No journal)
- Author-Topic over Time (AToT): A Dynamic Users’ Interest Model(Shuo Xu, Qingwei Shi, Xiaodong Qiao, Lijun Zhu, Hanmin Jung, Seungwoo Lee, Sung-Pil Choi, 2013, Lecture notes in electrical engineering)
- Application of dynamic topic models to toxicogenomics data(Mi‐Kyung Lee, Zhichao Liu, Ruili Huang, Weida Tong, 2016, BMC Bioinformatics)
- Generation of topic evolution trees from heterogeneous bibliographic networks(Scott A. Jensen, Xiaozhong Liu, Yingying Yu, Staša Milojević, 2016, Journal of Informetrics)
科学文献计量与学科研究演进分析
该组文献将主题演化应用于学术大数据。通过分析期刊论文、专利、引用网络及预印本,揭示特定学科(如生物信息学、中医药、计算语言学、气候变化等)的历史脉络、热点变迁、知识转移规律以及学术合作的演化,旨在提供学科前沿的可视化导航。
- 基于BERTopic模型的国内企业基础研究综述(俞梦婷, 2025, 社会科学前沿)
- 基于BERTopic模型的电商直播研究主题识别与演变趋势分析(张昊翔, 2026, 电子商务评论)
- Studying the history of ideas using topic models(David Hall, Daniel Jurafsky, Christopher D. Manning, 2008, No journal)
- 我国慢性病医防融合领域文献主题演化——基于Word2vec与LDA模型的可视化分析(李 艳, 唐 岚, 2025, 临床医学进展)
- Ten Years of Research on the Water-Energy-Food Nexus: An Analysis of Topics Evolution(Lira Luz Benites Lázaro, Rodrigo A. Bellezoni, José A. Puppim de Oliveira, Pedro Roberto Jacobi, Leandro Luiz Giatti, 2022, Frontiers in Water)
- 国内文本挖掘的热点主题和前沿演进——基于CNKI收录文献的可视化分析(王 鑫, 徐江南, 沈江明, 2020, 统计学与应用)
- Evaluation of the evolution of relationships between topics over time(Wolfgang Gaul, Dominique Vincent, 2016, Advances in Data Analysis and Classification)
- 国际写作评估研究主题动态演化趋势分析(汤小艺, 2025, 现代语言学)
- A Text Mining Based Map of Engineering Design: Topics and their Trajectories Over Time(Filippo Chiarello, Nicola Melluso, Andrea Bonaccorsi, Gualtiero Fantoni, 2019, Proceedings of the ... International Conference on Engineering Design)
- Detecting topic evolution in scientific literature(Qi He, Bi Yu Chen, Jian Pei, Baojun Qiu, Prasenjit Mitra, C. Lee Giles, 2009, No journal)
- A lead‐lag analysis of the topic evolution patterns for preprints and publications(Beibei Hu, Xianlei Dong, Chenwei Zhang, Timothy D. Bowman, Ying Ding, Staša Milojević, Chaoqun Ni, Erjia Yan, Vincent Larivière, 2015, Journal of the Association for Information Science and Technology)
- 基于文本挖掘的中医药学主题提取及演化(张 恒, 2025, 应用数学进展)
- Climate impacts to inland fishes: Shifting research topics over time(Abigail J. Lynch, A. D’Isanto, Julian D. Olden, Cindy Chu, Craig P. Paukert, Daria Gundermann, Mitchel Lang, Ray Zhang, Trevor J. Krabbenhoft, 2023, PLOS Climate)
- Topic Evolution and Emerging Topic Analysis Based on Open Source Software(Xiang Shen, Li Wang, 2020, Journal of Data and Information Science)
- Research Collaboration and ITS Topic Evolution: 10 Years at T-ITS(Linjing Li, Xin Li, Changjian Cheng, Cheng Chen, Guanyan Ke, Daniel Zeng, William T. Scherer, 2010, IEEE Transactions on Intelligent Transportation Systems)
- Understanding the topic evolution of scientific literatures like an evolving city: Using Google Word2Vec model and spatial autocorrelation analysis(Kai Hu, Qing Luo, Kunlun Qi, Siluo Yang, Jin Mao, Xiaokang Fu, Jie Zheng, Huayi Wu, Ya Guo, Qibing Zhu, 2019, Information Processing & Management)
- Analyzing topic evolution in bioinformatics: investigation of dynamics of the field with conference data in DBLP(Min Song, Go Eun Heo, Su Yeon Kim, 2014, Scientometrics)
- A Trend Analysis of Significant Topics Over Time in Machine Learning Research(Deepak Sharma, Bijendra Kumar, Satish Chand, Rajiv Ratn Shah, 2021, SN Computer Science)
社会舆情监测、突发事件与公共政策变迁
该组文献侧重于社会科学维度的文本演化。一方面关注社交媒体(Twitter, 微博, 抖音)上的突发事件舆情监测、情绪波动与信息扩散规律;另一方面通过分析政府公报、议会辩论等政策文本,追踪政治议程、双减政策、科创政策等宏观治理逻辑的演变。
- Topic evolution and social interactions(Ding Zhou, Xiang Ji, Hongyuan Zha, C. Lee Giles, 2006, No journal)
- Measuring User Influence in Twitter: The Million Follower Fallacy(Meeyoung Cha, Hamed Haddadi, Fabrício Benevenuto, Krishna P. Gummadi, 2010, Proceedings of the International AAAI Conference on Web and Social Media)
- Microblog topic evolution computing based on LDA algorithm(Jian Feng, Yajiao Wang, Yuanyuan Ding, 2018, Open Physics)
- Leveraging Social Context for Modeling Topic Evolution(Janani Kalyanam, Amin Mantrach, Diego Sáez-Trumper, Hossein Vahabi, Gert Lanckriet, 2015, No journal)
- Dynamic topic modeling of twitter data during the COVID-19 pandemic(Alexander Bogdanowicz, ChengHe Guan, 2022, PLoS ONE)
- Hashtag-based topic evolution in social media(Md. Hijbul Alam, Woo-Jong Ryu, SangKeun Lee, 2017, World Wide Web)
- 基于BERTopic的无人配送公众认知主题挖掘与演化分析(杨文清, 陈静远, 2026, 现代管理)
- 基于OLDA模型的社会热点事件演化路径研究(周莎莎, 2024, 统计学与应用)
- 基于情感分析的网络舆情主题演化分析 ——以双减政策为例(金百川, 曹 旭, 2022, 数据挖掘)
- 基于文本挖掘的我国电子商务政策演化研究(张千禧, 2025, 电子商务评论)
- “双减”背景下教育减负政策的文本特征分析与主题挖掘(赖思棋, 2025, 社会科学前沿)
- An Automated Method of Topic-Coding Legislative Speech Over Time with Application to the 105th-108th U.S. Senate(Kevin M. Quinn, Burt L. Monroe, Michael P. Colaresi, Michael H. Crespin, Dragomir Radev, 2006, No journal)
- Parliament's Debates about Infrastructure: An Exercise in Using Dynamic Topic Models to Synthesize Historical Change(Jo Guldi, 2019, Technology and Culture)
- Meme-tracking and the dynamics of the news cycle(Jure Leskovec, Lars Bäckström, Jon Kleinberg, 2009, No journal)
- Dynamic topic modeling of the COVID-19 Twitter narrative among U.S. governors and cabinet executives(Hao Sha, Mohammad Al Hasan, George Mohler, P. Jeffrey Brantingham, 2020, arXiv (Cornell University))
- Covid-19 Discourse on Twitter: How the Topics, Sentiments, Subjectivity, and Figurative Frames Changed Over Time(Philipp Wicke, Marianna Bolognesi, 2021, Frontiers in Communication)
- 食品安全网络舆情中主题、情绪与用户行为影响机理研究——以鼠头鸭脖事件为例(史沁阳, 2024, 运筹与模糊学)
- 基于文本分析的网络舆情主题演化及主体特征研究(许文晴, 王成龙, 沈惠璋, 2023, 新闻传播科学)
- 西安市科技创新政策演化分析(田 杰, 毕云清, 姬 浩, 2026, 可持续发展)
- 数智化转型下智慧健康养老政策主题演化与优化路径——基于LDA模型的文本分析(何慧丽, 2025, 电子商务评论)
垂直行业实证应用与市场趋势研究
该组文献展示了主题模型在特定行业或市场中的应用价值。涵盖了软件工程(Commit分析)、金融市场波动预测、加密货币社区、电子商务直播、老年人幸福感研究、旅游与酒店客户体验,以及能源安全(水-能源-粮食纽带)等具体场景。
- 基于LDA2vec模型老年人主观幸福感主题时序演化分析(陈婉铭, 2025, 运筹与模糊学)
- Topic evolution, disruption and resilience in early COVID-19 research(Yi Zhang, Xiaojing Cai, Caroline Fry, Mengjia Wu, Caroline S. Wagner, 2021, Scientometrics)
- Modeling the evolution of development topics using Dynamic Topic Models(Jiajun Hu, Xiaobing Sun, David Lo, Bin Li, 2015, No journal)
- Modeling Skill Acquisition Over Time with Sequence and Topic Modeling(José P. González-Brenes, 2015, International Conference on Artificial Intelligence and Statistics)
- Forecasting Financial Market Volatility Using a Dynamic Topic Model(Takayuki Morimoto, Yoshinori Kawasaki, 2017, Asia-Pacific Financial Markets)
- Who cares about coal? Analyzing 70 years of German parliamentary debates on coal with dynamic topic modeling(Finn Müller-Hansen, Max Callaghan, Yuan Ting Lee, Anna Leipprand, Christian Flachsland, Jan C. Minx, 2021, Energy Research & Social Science)
- Concept over time: the combination of probabilistic topic model with wikipedia knowledge(Liang Yao, Yin Zhang⋆, Baogang Wei, Lei Li, Fei Wu, Peng Zhang, Yali Bian, 2016, Expert Systems with Applications)
- Dynamic Topic Modeling Reveals Variations in Online Hate Narratives(Richard Sear, Nicholas J. Restrepo, Yonatan Lupu, Neil F. Johnson, 2022, Lecture notes in networks and systems)
- Semantic-enhanced topic evolution analysis: a combination of the dynamic topic model and word2vec(Qiang Gao, Xiao Huang, Ke Dong, Zhentao Liang, Jiang Wu, 2022, Scientometrics)
- Industry And Skill Wage Premiums In East Asia(Emanuela di Gropello, Chris Sakellariou, 2010, World Bank eBooks)
- 基于实体识别的纺织技术主题内容演化研究(Unknown Authors, 2025, 管理科学与工程)
- The mutually beneficial relationship of patents and scientific literature: topic evolution in nanoscience(Qi Yashuang, Na Zhu, Yujia Zhai, Ying Ding, 2018, Scientometrics)
- Topic evolution based on LDA and HMM and its application in stem cell research(Qingqiang Wu, Caidong Zhang, Qingqi Hong, Liyan Chen, 2014, Journal of Information Science)
- Understanding Cybersecurity Threat Trends Through Dynamic Topic Modeling(Jennifer Sleeman, Tim Finin, Milton Halem, 2021, Frontiers in Big Data)
- 探索在线评论对票房收入的影响——基于多维度情感视角(别春洋, 陶贻勇, 2023, 现代管理)
- Analysing online customer experience in hotel sector using dynamic topic modelling and net promoter score(Van-Ho Nguyen, Thanh Ho, 2023, Journal of Hospitality and Tourism Technology)
- Dynamic Topic Modeling for Monitoring Market Competition from Online Text and Image Data(Hao Zhang, Gunhee Kim, Eric P. Xing, 2015, No journal)
- Dynamic Topic Modelling for Cryptocurrency Community Forums(Marie Larsson Linton, Ernie G. S. Teo, Elisabeth Bommes, Cheng–Ying Chen, Wolfgang Karl Härdle, 2017, Statisctics and computing/Statistics and computing)
- Examining user perceptions of smartwatch through dynamic topic modeling(Taehyun Ha, Bjorn Beijnon, Sangyeon Kim, Sangwon Lee, Jang Hyun Kim, 2017, Telematics and Informatics)
多维特征关联、可视化系统与个性化分析
该组研究强调主题演化的表现形式与综合分析。包括开发可视化交互系统(D-VITA, WordStream)辅助数据探索,整合地理空间信息(GPS/IP)、跨模态信息(视频/文本),以及利用主题演化捕捉用户兴趣动态,从而实现个性化推荐和教育成效评估。
- A Dynamic Topic Model and Matrix Factorization-Based Travel Recommendation Method Exploiting Ubiquitous Data(Zhenxing Xu, Ling Chen, Yimeng Dai, Gencai Chen, 2017, IEEE Transactions on Multimedia)
- 基于主题模型的热点新闻推荐算法研究(曾知涧, 王利洪, 燕旭飞, 2019, 计算机科学与应用)
- 移动学习投入量动态变化的文本挖掘和情感分析研究(肖 巍, 李金凤, 2021, 创新教育研究)
- Personal stories matter: topic evolution and popularity among pro- and anti-vaccine online articles(Zhan Xu, 2019, Journal of Computational Social Science)
- A Joint Model for Topic-Sentiment Evolution over Time(Mohamed Dermouche, Julien Velcin, Leila Khouas, Sabine Loudcher, 2014, No journal)
- An Interactive System for Visual Analytics of Dynamic Topic Models(Nikou Günnemann, Michael Derntl, Ralf Klamma, Matthias Jarke, 2013, Datenbank-Spektrum)
- A Dynamic Topic Model of Learning Analytics Research(Michael Derntl, Nikou Günnemann, Ralf Klamma, 2013, RWTH Publications (RWTH Aachen))
- Generation of topic evolution graphs from short text streams(Wang Gao, Min Peng, Hua Wang, Yanchun Zhang, Weiguang Han, Gang Hu, Qianqian Xie, 2019, Neurocomputing)
- Tracking Topic Evolution in News Environments(Maximilian Viermetz, Michał Skubacz, Cai-Nicolas Ziegler, Dietmar Seipel, 2008, No journal)
- Understanding the topic evolution in a scientific domain: An exploratory study for the field of information retrieval(Baitong Chen, Satoshi Tsutsui, Ying Ding, Feicheng Ma, 2017, Journal of Informetrics)
- A Dynamic Probabilistic Model to Visualise Topic Evolution in Text Streams(Ata Kabán, Mark Girolami, 2002, Journal of Intelligent Information Systems)
- Building and Exploring Dynamic Topic Models on the Web(Michael Derntl, Nikou Günnemann, Alexander Tillmann, Ralf Klamma, Matthias Jarke, 2014, No journal)
- On Dynamic Topic Models for Mining Social Media(Shatha Jaradat, Mihhail Matskin, 2018, Lecture notes in social networks)
- Tracking online topics over time: understanding dynamic hashtag communities(Philipp Lorenz-Spreen, Frederik Wolf, Jonas Braun, Gourab Ghoshal, Nataša Djurdjevac Conrad, Philipp Hövel, 2018, Computational Social Networks)
- WordStream: Interactive Visualization for Topic Evolution(Tommy Dang, Huyen N. Nguyen, Vung Pham, 2019, Eurographics)
- Tracking urban geo-topics based on dynamic topic model(Fang Yao, Yan Wang, 2019, Computers Environment and Urban Systems)
- Video Behaviour Mining Using a Dynamic Topic Model(Timothy M. Hospedales, Shaogang Gong, Tao Xiang, 2011, International Journal of Computer Vision)
- The diversity of canonical and ubiquitous progress in computer vision: A dynamic topic modeling approach(Wen Lou, Jie Meng, 2022, Information Processing & Management)
- Location-based topic evolution(Haiqin Yang, Shouyuan Chen, Michael R. Lyu, Irwin King, 2011, No journal)
- Temporal relation co-clustering on directional social network and author-topic evolution(Wei Peng, Tao Li, 2010, Knowledge and Information Systems)
本报告将主题演化研究划分为五个维度:底层模型算法的理论优化(DTM、在线学习与语义增强)、科学文献计量的实证研究(学科趋势与知识图谱)、社会舆情与公共政策的动态监测(社交媒体与政务文本)、垂直行业的商业应用(金融、软件、医疗等场景),以及多维数据关联与交互式可视化分析(系统开发、地理空间与用户个性化)。整体研究趋势呈现出从单一文本主题向多模态、时空多维融合演进,从通用模型向领域适配与人机交互可视化系统跨越的特征。
总计104篇相关文献
本研究使用文献可视化工具SciMAT,对2003~2023年Web of Science核心数据库中写作评估领域的文献,进行主题动态演化的文献计量学分析。通过关键词演进覆盖图、聚类战略图以及动态演化路径图的剖析,研究发现写作评估领域呈现出繁荣发展的态势,形成了“句法复杂度”“学生感知”“效度”“写作自动评估”“教学策略”“二语写作”及“性能评估”七个研究方向的十条演化路径。未来写作评估相关研究需融合基于语言规则的理性主义和基于语言大数据的经验主义的跨学科研究方法,解构“写作质量”多维度、多层级构念,拓宽写作评估领域的研究内容与视角。
在互联网高速发展的今天,网络新闻已成为人们获得信息的主要途径,如何准确地为用户提供个性化的新闻推荐已成为业内人士日益关注的问题。为解决这一问题,出现了很多基于LDA的新闻推荐,但它们只进行新闻内容的分析,没有考虑用户兴趣的变化。针对此问题,本文提出了一种基于主题兴趣变化的热点新闻推荐算法。首先,用固定时间窗大小划分用户的阅读历史,并在每个阶段根据用户的阅读历史,利用LDA得到用户兴趣的概率分布。其次,利用时间惩罚加权函数和用户在每个阶段的新闻主题分布预测用户下一阶段的可能兴趣。最后,根据用户兴趣概率分布利用基于用户的协同过滤和待推荐新闻的主题分布完成热点新闻推荐。通过实际数据集上的实验表明,该方法提高了推荐的性能。
目的:针对基于LDA模型进行主题识别及演化分析方法在主题数量选择困难、时间窗口划分主观性强等方面的局限提出优化改进,从而推动主题识别及演化分析方法的进步。方法:结合TF-IDF算法和Word2Vec词向量技术计算主题向量,减少主题生成时常用词汇的影响,同时实现主题向量的语义表达。在主题演化过程中提出基于主题语义距离变化的方法划分时间窗口,跟踪目标领域主题强度和主题内容的演化趋势。最后以软件开源领域研究文献为例进行实证研究。结果:研究结果显示,本文提出的优化方法能够有效识别领域的研究主题及热点主题,跟踪主题随时间演化的路径,并可视化呈现。结论:软件开源研究存在六个关键主题,其中“开源治理”和“市场竞争”是该研究领域的热点主题。从主题内容的演变来看,软件开源的研究正从个人自发参与的自治动机转向企业与政府等组织层面的参与。
在新质生产力加速重构物流产业格局的背景下,把握公众对无人配送技术的深层认知结构与演化规律,对于推动人工智能与现代物流的深度融合具有重要意义。本研究整合抖音与微博评论数据,采用基于Transformer的BERTopic动态主题模型解构公众认知与演化特征。研究结果显示,模型识别出语音技术、生鲜运输等核心议题,并通过层次聚类将其整合为经济效益与社会伦理、配送场景与服务质量、技术本体感知与创新评价、环境交互与安全适应四大维度。演化路径上,揭示了议题从技术储备与舆论平稳期、示范场景先行与应用探索期、路权开放与商业化探索调整期,向新质生产力爆发期跃迁的范式转移规律。本研究结论可为无人配送的技术范式迭代、场景生态构建及产业政策优化提供数据驱动的实证支撑。
本文选取了2017~2021年的121,603条在线评论,将多属性态度理论引入到实证研究中,从多维度情感驱动的新视角考察在线评论对票房的影响。采用DTM (Dynamic Topic Models,动态主题模型)和情感挖掘技术从在线评论中提取特定维度的情感,然后使用分位数回归分析多维度情感对电影票房的影响。研究结果表明,三个维度的情感对电影票房具有正向促进作用(明星、类型和情节)。具体而言,明星对票房的影响呈现倒U型,情节对票房的影响随着分位数的上升而增加,类型对于票房的影响集中在中部的分位点。情感方差负向调节三个特定维度情绪对票房的影响。我们的研究丰富了关于网络评论和电影营销的实证研究,并基于实证结果提出了一些管理意义和实践见解。
关注老年人的主观幸福感在积极应对老龄化方面具有重要意义。本文采用LDA2vec主题模型对中国知网数据库中老年人主观幸福感领域文章进行主题挖掘,运用TF-IDF算法、LDA模型结合Word2vec词向量模型,从时间维度上深入挖掘老年人幸福感的核心主题及其演变路径,得到“养老模式与社会演化的关系”、“社会关系与偏远地区老年人”、“跨文化视角下的老年人幸福感”和“健康与数字老龄化”四个主题。通过计算主题热度,得到近五年的主题热度趋势结果。同时在时间维度上讨论了各主题的拐点时间和首次发文时间,并可视化三个时间窗口上主题演化情况,直观呈现了老年人主观幸福感文章的主题结构和演化趋势。从研究热点看,“心理健康”与“社会支持”是该领域的重要研究主题。从整体上看,主题间的交叉融合不断发展,研究的主题逐渐多样化。
目的/意义:为反映突发经济事件中网络舆情主题演化的时序发展规律及重要参与主体的特征,为涉事企业积极应对网络舆情危机提供指导,并提供一种有效识别网络舆情公众关注的主题及分析其演化规律的参考方法。方法/过程:本文以“M集团IPO叫停”突发经济事件为例,基于LDA主题模型与K-means聚类对微博相关博文进行文本分析。在主题识别基础上,根据舆情发展的生命周期理论,研究突发经济事件网络舆情传播不同阶段的热点主题及主体类别。结果/结论:揭示了网络舆情生命周期各阶段公众关注的主题及其演化特征以及对推动舆情演化发挥主导作用的主体特征,为涉事企业制定积极的网络舆情应对策略提供了相应的管理启示。
本文通过python编写爬虫程序,收集有关双减政策的微博评论并通过snowNLP对其情感分析,在积极与消极情感分别通过LDA模型做主题分析,得出了“政策实施效果”、“时间分配”、“教培老师转行”、“疫情开学”、“教育公平”、“教师待遇”、“家长陪伴”、“学业减负”、“兴趣培养”、“人才培养”、“教育改革”等13个主题关键词。通过分析得知,双减政策对于处于义务教育阶段学生的学习模式有较大的改变,也对课外教育机构进行了有效的打压。可以看出双减政策的强势性。从长远的角度看,该项政策有效的推动了人才强国战略的实施,有助于教育的良好发展。
为了深入研究我国慢性病医防融合领域的发展趋势和演化过程,本文收集了2006~2024年的373篇相关文献,经过数据清洗和预处理后,引入Word2vec的LDA模型进行文献的主题挖掘,确定每个时期的最佳主题数量,并生成主题演化桑基图。计算不同时间段内各主题强度,并通过交互式条形图描述热点主题。结果显示,在第一阶段2006~2020年,大部分研究主要集中在如何整合医疗服务,以及如何将慢性病防控与医防结合;在第二阶段2021~2022年,除了延续既有的主题,部分研究焦点转移到如何更好地管理和融合综合医疗服务,以及如何将公共卫生服务与医疗体系更有效地结合;在第三阶段2023~2024年,研究重点在于如何实现健康服务与医防的深度融合,以及如何在医疗服务中具体落实医防融合的理念,研究更加注重实际操作和具体应用。通过主题演化分析揭示了不同时期内主题之间的关联和演化过程,综合医疗服务、慢性病防控与医防结合等主题在不同阶段都有较强的延续性,而研究重点随着时间的推移逐渐从综合医疗服务向医防融合和健康服务管理方向转移。研究发现,一些主题在不同时期内保持较高的强度,从本研究主题强度图可以看出,在慢性病医防融合领域,社区基层医疗机构在医防融合中具有重要作用,此外2021年及以后的阶段中公共卫生体系建设及医防融合成为研究的共识热点。该研究有助于更全面地理解慢性病医防融合领域的研究动态,为未来的研究方向和政策制定提供有益的参考,同时也为文本分析方法的应用提供了实践示范。未来的研究可以进一步挖掘基层医疗与医防协同机制以及健康服务管理与慢性病防控方面的潜力,更好地帮助社区基层医疗机构服务提供者应对来自人口老龄化社会慢性病高发以及多样化健康需求的挑战,同时也要关注对应的新兴技术如人工智能和大数据分析和对应的数据隐私和伦理挑战,以及政策实施中的风险。
本研究立足于数智化转型不断深化的时代背景,采用LDA主题模型对2011~2024年间中央及省级政府发布的智慧健康养老政策文本进行挖掘分析。通过对文本分词、去除停用词等预处理,并依据困惑度指标确定最优主题数,识别出六大核心主题,分别为:互联网+医疗健康产业创新发展、社区居家养老服务体系优化、智能养老产品研发与推广、医疗大数据平台与信息化建设、基层医疗机构信息化能力提升以及智慧养老机构服务生态构建。进一步分析表明,政策主题强度随时间呈现明显的三阶段演化特征:“十二五”时期侧重于基础信息化建设与居家社区服务;“十三五”时期扩展至智能产品研发、医疗数据应用与产业生态的初步培育;“十四五”时期则进一步聚焦于数智技术深度融合与新业态系统构建,表明政策从局部建设转向系统化、生态化发展。基于上述发现,本研究从强化技术应用与数据治理能力、整合智慧康养服务资源、释放市场潜力以及完善政策协同与执行机制四个维度,提出推动智慧健康养老高质量发展的优化路径,为构建更加精准、高效和可持续的智慧健康养老体系提供决策参考与实践依据。
专利文本是技术创新的核心构建要素,对文本内容进行主题分析有助于厘清技术主题分布及演变趋势。以2018~2022年间知网纺织面料制备技术专利为研究对象,利用命名实体识别进行研究,以提取物体类实体作为专利文本内容分析的依据,按年划分时间窗口,使用困惑度–主体方差得到最优主题数。通过分析技术主题内容演变过程总结得到纺织面料制备的创新模式。通过分析主题内容演变过程,将其归纳为面料原料、面料制备工艺和面料特性三组技术元素,给出进一步面料制备的开发建议。为了克服主题建模中难以准确快速地选定词簇表示主题的难题,利用命名实体识别技术简化技术术语抽取工作,使用ERNIE3.0知识增强预训练模型快速得到具备强概括能力的技术术语集合。
企业是引领和推动我国基础研究发展的重要载体,系统分析相关文献可为未来研究提供理论依据。本文以2010~2024年在CSSCI期刊中发表的文献为文本数据源进行分析,运用BERTopic模型,通过文本挖掘与主题识别探究热点主题与主题演化趋势。研究结果表明:我国企业基础研究包含25个潜在研究主题,可合并为产学研融合、政府激励政策、引领能力、资源配置、市场需求、产业生态构建6大主题,热点主题为产学研融合、政府激励政策、引领能力、资源配置;从主题演化趋势来看,热点主题贯穿企业基础研究领域研究历史,主题演化呈现热度稳步增长、范畴逐步深入的总体趋势,不同时间段占据主导的主题交替更迭,存在部分主题热度后来者居上的现象,具有一定的时代特性。
电商直播的快速发展正在重塑电子商务,推动商业模式创新并提高运营效率。然而,现有研究主要集中于电商直播的特定应用场景,缺乏对其整体发展趋势、核心研究主题的系统研究。为填补这一空白,本研究基于知网数据库2016年至2026年发表的884篇论文,运用BERTopic主题建模方法,系统地分析了电商直播领域的关键研究主题和演变趋势。研究共识别出16个核心主题,归为消费者行为与决策、乡村振兴与农产品直播、媒体融合与内容传播等六大类,发现领域研究经历了基础探索、深化发展至创新拓展的三阶段,当前技术赋能与规范化成为核心趋势。本研究构建了电商直播领域的全景式知识图谱,既为学界提供了系统性的研究参考,也为企业营销优化与行业治理完善提供了实践指引。
文章挖掘中医药学领域过去30年的研究主题,总结中医药研究主题的主流、变迁及演化,爬取中医药学领域硕博论文及权威期刊,划分时间段分析研究方向与方法,运用词云图、词频统计、LDA主题模型分析研究主题热点。查找中医药学领域的硕博论文及期刊,最终整合得到14个主要研究主题。硕博论文主要研究信号通路,中药和疾病都有涉及;《中国中药杂志》以中药研究和统计分析为主;《中医杂志》更关注具体疾病的诊治。LDA主题模型能有效挖掘中医药学文献的研究主题,80%都能被相应领域的综述类文献所验证。
社会热点事件与网络舆情有着紧密联系,已经成为影响社会稳定的主要因素之一,热点事件应变能力反映了网络社会波动的相对稳定能力。本文结合社会热点事件和网络舆情的特征和演化规律,构建社会热点演化指标体系,并对数据进行赋权和标准化等处理,通过灰色关联分析方法得出数据之间的关联度,并进一步根据聚类中心,将社会热点演化等级分为轻微级、一般级和重大级三类,最后利用在线主题模型分析总结事件发展各阶段的主题词,根据主题词推算社会热点事件的演化路径。
互联网产业的快速发展,也给电子商务产业带来了新的发展机遇。政策作为引导电子商务产业发展的重要影响因素,揭示了电子商务政策的演化路径对电商产业的发展具有重要意义。本文基于政策生命周期理论,对从中国政府网、国家发改委和商务局官方网站上收集到的345份电子商务政策进行生命周期划分,共划分了三个阶段,利用LDA主题模型提取各个阶段的政策主题,并结合Word2Vec词向量模型对其进行主题相似度计算,构建了我国电子商务政策的演化路径。研究发现,我国电子商务政策呈现从基础建设到体系优化,再到产业融合的渐进发展路径,且重点在于关注农村电子商务和农产品跨境电子商务两大产业领域。
在食品安全网络舆情发酵的过程中,舆情主题、情感与用户行为相互影响,共同推动着舆情的发展,研究它们之间的动态影响机理对舆情管理具有重要意义。本文选取“鼠头鸭脖”食品安全事件的微博舆情数据为研究对象,基于融知发酵理论,考虑舆情主题演化、情感波动和用户行为三个变量,构建向量自回归模型,结合脉冲响应分析与方差分解探究食品安全网络舆情中主题演化、情感波动和用户行为之间的动态影响机理并提出食品安全网络舆情管控措施。
基于西安市科技创新中心建设工作,本研究收集2016~2025年西安市科技创新政策,使用主题相似度方法,对科技创新政策进行了演化分析,发现西安市科技创新政策体系由宽泛向精细化演进,从整体战略规划开始,逐渐精细到多领域稳步推进。政策主题逐渐丰富,涵盖科技创新政策环境、基础设施、企业激励、平台、资金、成果转化、人才和项目管理等全面的科技创新工作措施,但尚有各政策协同度不够,企业主体地位尚未落实的问题。
本文研究了2018~2025年间国家和地方层面的教育减负政策,特别是“双减”政策后的影响。研究运用Python进行文本挖掘,结合jieba分词、gephi可视化和LDA主题模型,分析了政策文本的外部特征(时间分布、发文主体、高频词、文本类型)和主题内容。研究发现:① 中央层面教育减负政策在2021年达到峰值,而地方教育减负政策发文在2021年爆发式增长,在随后一年达到峰值;② 教育局在度中心、中介中心、接近中心均位于首位,体现其绝对领导地位;③ 以“双减”颁布、“双减”颁布一周年为节点,将2018~2025年划分为3个时期分别构建LDA主题模型,得出政策酝酿与初步探索阶段——政策试点与初步落实阶段——政策深化实施阶段的主题,表明教育减负政策不断深化对校外培训机构监管与校内教学服务质量提升,加强校内外资源统筹。基于这些发现,文章提出了针对校外培训机构转型、校内教学质量提升、政策协同以及教育资源整合的建议。
使用数据可视化软件CiteSpace基于中国学术网络出版总库(CNKI)收录的关于研究文本挖掘的中文文献对机构、作者、关键词等绘制图谱并进行分析与评述。经研究表现出三方面结论:1) 各研究机构之间合作比较分散,合作较少;2) 各学者间的交流与合作不显著,合作意识仍然有待提高;3) 研究的热点主题有web挖掘、文本分类、中成药、西药、数据分层算法、大数据文本、情感分析;大数据下文本挖掘与情感分析为我国文本挖掘研究的主要研究趋势。
本文使用文本挖掘和情感分析两种技术,分析了“语言学导论”课程移动学习平台的学生发言数据,探讨了学生课前/课后的行为/情感投入量的动态变化规律,以及移动学习投入对学习成效的提升作用。结果表明:行为投入从开学到期中稳中略升,期末有所下降,且课后投入高于课前。课后情感投入与行为投入类似,但课前情感投入持续低迷、波动较大。无论是课前/课后还是行为/情感投入,均与期末成绩呈显著正相关。与未使用移动学习平台的班级相比,使用移动学习的班级期末成绩更高。
A family of probabilistic time series models is developed to analyze the time evolution of topics in large document collections. The approach is to use state space models on the natural parameters of the multinomial distributions that represent the topics. Variational approximations based on Kalman filters and nonparametric wavelet regression are developed to carry out approximate posterior inference over the latent topics. In addition to giving quantitative, predictive models of a sequential corpus, dynamic topic models provide a qualitative window into the contents of a large document collection. The models are demonstrated by analyzing the OCR'ed archives of the journal Science from 1880 through 2000.
In this paper, we develop the continuous time dynamic topic model (cDTM). The cDTM is a dynamic topic model that uses Brownian motion to model the latent topics through a sequential collection of documents, where a "topic" is a pattern of word use that we expect to evolve over the course of the collection. We derive an efficient variational approximate inference algorithm that takes advantage of the sparsity of observations in text, a property that lets us easily handle many time points. In contrast to the cDTM, the original discrete-time dynamic topic model (dDTM) requires that time be discretized. Moreover, the complexity of variational inference for the dDTM grows quickly as time granularity increases, a drawback which limits fine-grained discretization. We demonstrate the cDTM on two news corpora, reporting both predictive perplexity and the novel task of time stamp prediction.
We propose an online topic model for sequentially analyzing the time evolution of topics in document collections. Topics naturally evolve with multiple timescales. For example, some words may be used consistently over one hundred years, while other words emerge and disappear over periods of a few days. Thus, in the proposed model, current topic-specific distributions over words are assumed to be generated based on the multiscale word distributions of the previous epoch. Considering both the long-timescale dependency as well as the short-timescale dependency yields a more robust model. We derive efficient online inference procedures based on a stochastic EM algorithm, in which the model is sequentially updated using newly obtained data; this means that past data are not required to make the inference. We demonstrate the effectiveness of the proposed method in terms of predictive performance and computational efficiency by examining collections of real documents with timestamps.
No abstract
No abstract
Dynamic topic models (DTMs) are very effective in discovering topics and capturing their evolution trends in time series data. To do posterior inference of DTMs, existing methods are all batch algorithms that scan the full dataset before each update of the model and make inexact variational approximations with mean-field assumptions. Due to a lack of a more scalable inference algorithm, despite the usefulness, DTMs have not captured large topic dynamics. This paper fills this research void, and presents a fast and parallelizable inference algorithm using Gibbs Sampling with Stochastic Gradient Langevin Dynamics that does not make any unwarranted assumptions. We also present a Metropolis-Hastings based $O(1)$ sampler for topic assignments for each word token. In a distributed environment, our algorithm requires very little communication between workers during sampling (almost embarrassingly parallel) and scales up to large-scale applications. We are able to learn the largest Dynamic Topic Model to our knowledge, and learned the dynamics of 1,000 topics from 2.6 million documents in less than half an hour, and our empirical results show that our algorithm is not only orders of magnitude faster than the baselines but also achieves lower perplexity.
As the development of a software project progresses, its complexity grows accordingly, making it difficult to understand and maintain. During software maintenance and evolution, software developers and stakeholders constantly shift their focus between different tasks and topics. They need to investigate into software repositories (e.g., revision control systems) to know what tasks have recently been worked on and how much effort has been devoted to them. For example, if an important new feature request is received, an amount of work that developers perform on ought to be relevant to the addition of the incoming feature. If this does not happen, project managers might wonder what kind of work developers are currently working on. Several topic analysis tools based on Latent Dirichlet Allocation (LDA) have been proposed to analyze information stored in software repositories to model software evolution, thus helping software stakeholders to be aware of the focus of development efforts at various time during software evolution. Previous LDA-based topic analysis tools can capture either changes on the strengths of various development topics over time (i.e., strength evolution) or changes in the content of existing topics over time (i.e., content evolution). Unfortunately, none of the existing techniques can capture both strength and content evolution. In this paper, we use Dynamic Topic Models (DTM) to analyze commit messages within a project's lifetime to capture both strength and content evolution simultaneously. We evaluate our approach by conducting a case study on commit messages of two well-known open source software systems, jEdit and PostgreSQL. The results show that our approach could capture not only how the strengths of various development topics change over time, but also how the content of each topic (i.e., words that form the topic) changes over time. Compared with existing topic analysis approaches, our approach can provide a more complete and valuable view of software evolution to help developers better understand the evolution of their projects.
No abstract
No abstract
No abstract
This study analyzes the political agenda of the European Parliament (EP) plenary, how it has evolved over time, and the manner in which Members of the European Parliament (MEPs) have reacted to external and internal stimuli when making plenary speeches. To unveil the plenary agenda and detect latent themes in legislative speeches over time, MEP speech content is analyzed using a new dynamic topic modeling method based on two layers of Non-negative Matrix Factorization (NMF). This method is applied to a new corpus of all English language legislative speeches in the EP plenary from the period 1999 to 2014. Our findings suggest that two-layer NMF is a valuable alternative to existing dynamic topic modeling approaches found in the literature, and can unveil niche topics and associated vocabularies not captured by existing methods. Substantively, our findings suggest that the political agenda of the EP evolves significantly over time and reacts to exogenous events such as EU Treaty referenda and the emergence of the Euro Crisis. MEP contributions to the plenary agenda are also found to be impacted upon by voting behavior and the committee structure of the Parliament.
We propose a dynamic topic model for monitoring temporal evolution of market competition by jointly leveraging tweets and their associated images. For a market of interest (e.g. luxury goods), we aim at automatically detecting the latent topics (e.g. bags, clothes, luxurious) that are competitively shared by multiple brands (e.g. Burberry, Prada, and Chanel), and tracking temporal evolution of the brands' stakes over the shared topics. One of key applications of our work is social media monitoring that can provide companies with temporal summaries of highly overlapped or discriminative topics with their major competitors. We design our model to correctly address three major challenges: multiview representation of text and images, modeling of competitiveness of multiple brands over shared topics, and tracking their temporal evolution. As far as we know, no previous model can satisfy all the three challenges. For evaluation, we analyze about 10 millions of tweets and 8 millions of associated images of the 23 brands in the two categories of luxury and beer. Through experiments, we show that the proposed approach is more successful than other candidate methods for the topic modeling of competition. We also quantitatively demonstrate the generalization power of the proposed method for three prediction tasks.
Dynamic topic modeling of the COVID-19 Twitter narrative among U.S. governors and cabinet executives
A combination of federal and state-level decision making has shaped the response to COVID-19 in the United States. In this paper we analyze the Twitter narratives around this decision making by applying a dynamic topic model to COVID-19 related tweets by U.S. Governors and Presidential cabinet members. We use a network Hawkes binomial topic model to track evolving sub-topics around risk, testing and treatment. We also construct influence networks amongst government officials using Granger causality inferred from the network Hawkes process.
The vast volumes of community-contributed geotagged photos (CCGPs) available on the Web can be utilized to make travel location recommendations. The sparsity of user location interactions makes it difficult to learn travel preferences, because a user usually visits only a limited number of travel locations. Static topic models can be used to solve the sparsity problem by considering user travel topics. However, all travel histories of a user are regarded as one document drawn from a set of static topics, ignoring the evolving of topics and travel preferences. In this paper, we propose a dynamic topic model (DTM) and matrix factorization (MF)-based travel recommendation method. A DTM is used to obtain the temporally fine-grained topic distributions (i.e., implicit topic information) of users and locations. In addition, a large amount of explicit information is extracted from the metadata and visual contents of CCGPs, check-ins, and point of interest categories datasets. The information is used to obtain user-user and location-location similarity information, which is imposed as two regularization terms to constraint MF. The proposed method is evaluated on a publicly available Flickr dataset. Experimental results demonstrate that the proposed method can generate significantly superior recommendations compared to other state-of-the-art travel location recommendation studies.
In an effort to gauge the global pandemic's impact on social thoughts and behavior, it is important to answer the following questions: (1) What kinds of topics are individuals and groups vocalizing in relation to the pandemic? (2) Are there any noticeable topic trends and if so how do these topics change over time and in response to major events? In this paper, through the advanced Sequential Latent Dirichlet Allocation model, we identified twelve of the most popular topics present in a Twitter dataset collected over the period spanning April 3rd to April 13th, 2020 in the United States and discussed their growth and changes over time. These topics were both robust, in that they covered specific domains, not simply events, and dynamic, in that they were able to change over time in response to rising trends in our dataset. They spanned politics, healthcare, community, and the economy, and experienced macro-level growth over time, while also exhibiting micro-level changes in topic composition. Our approach differentiated itself in both scale and scope to study the emerging topics concerning COVID-19 at a scale that few works have been able to achieve. We contributed to the cross-sectional field of urban studies and big data. Whereas we are optimistic towards the future, we also understand that this is an unprecedented time that will have lasting impacts on individuals and society at large, impacting not only the economy or geo-politics, but human behavior and psychology. Therefore, in more ways than one, this research is just beginning to scratch the surface of what will be a concerted research effort into studying the history and repercussions of COVID-19.
No abstract
Dynamic topic models (DTMs) model the evolution of prevalent themes in literature, online media, and other forms of text over time. DTMs assume that word co-occurrence statistics change continuously and therefore impose continuous stochastic process priors on their model parameters. These dynamical priors make inference much harder than in regular topic models, and also limit scalability. In this paper, we present several new results around DTMs. First, we extend the class of tractable priors from Wiener processes to the generic class of Gaussian processes (GPs). This allows us to explore topics that develop smoothly over time, that have a long-term memory or are temporally concentrated (for event detection). Second, we show how to perform scalable approximate inference in these models based on ideas around stochastic variational inference and sparse Gaussian processes. This way we can train a rich family of DTMs to massive data. Our experiments on several large-scale datasets show that our generalized model allows us to find interesting patterns that were not accessible by previous approaches.
No abstract
In t opic t racking, a topic is usually described by several stories. How to represent a topic is always an issue and a difficult problem in the research on topic tracking. To emphasis the topic in stories, we provide an improved topic-based tf * idf weighting method to measure the topical importance of the features in the representation model. To overcome the topic drift problem and filter the noise existed in the tracked topic description , a dynamic topic model is proposed based on the static model. It extends the initial topic model with the information from the incoming related stories and filters the noise using the latest unrelated story. The topic tracking systems are implemented on the TDT4 Chinese corpus. The experimental results indicate that both the new weighting method and the dynamic model can improve the tracking performance.
No abstract
Can "distant reading" and digital tools enhance the history of technology by revealing hitherto undetected patterns in the record? Using the parliamentary debates of Britain in the nineteenth century, this essay revisits the history of infrastructure in the British empire, asking how tools compare with secondary sources. The author applies topic modeling and dynamic topic modeling to synthesize the historical record, and finds that the results largely match the known turning points, key players, and technologies in the history of British infrastructure. In several cases, however, digital investigation draws the researcher to new results. The domains where digital tools unveil new patterns include: the improvement of the River Shannon, the alignment of political parties with particular technologies, and the chronology of building public spaces in Britain's imperial capitals. The experiment documented here validates topic models as a source for periodizing technology over time.
Cybersecurity threats continue to increase and are impacting almost all aspects of modern life. Being aware of how vulnerabilities and their exploits are changing gives helpful insights into combating new threats. Applying dynamic topic modeling to a time-stamped cybersecurity document collection shows how the significance and details of concepts found in them are evolving. We correlate two different temporal corpora, one with reports about specific exploits and the other with research-oriented papers on cybersecurity vulnerabilities and threats. We represent the documents, concepts, and dynamic topic modeling data in a semantic knowledge graph to support integration, inference, and discovery. A critical insight into discovering knowledge through topic modeling is seeding the knowledge graph with domain concepts to guide the modeling process. We use Wikipedia concepts to provide a basis for performing concept phrase extraction and show how using those phrases improves the quality of the topic models. Researchers can query the resulting knowledge graph to reveal important relations and trends. This work is novel because it uses topics as a bridge to relate documents across corpora over time.
Semisupervised and unsupervised systems provide operators with invaluable support and can tremendously reduce the operators' load. In the light of the necessity to process large volumes of video data and provide autonomous decisions, this paper proposes new learning algorithms for activity analysis in video. The activities and behaviors are described by a dynamic topic model. Two novel learning algorithms based on the expectation maximization approach and variational Bayes inference are proposed. Theoretical derivations of the posterior estimates of model parameters are given. The designed learning algorithms are compared with the Gibbs sampling inference scheme introduced earlier in the literature. A detailed comparison of the learning algorithms is presented on real video data. We also propose an anomaly localization procedure, elegantly embedded in the topic modeling framework. It is shown that the developed learning algorithms can achieve 95% success rate. The proposed framework can be applied to a number of areas, including transportation systems, security, and surveillance.
We develop dependent hierarchical normalized random measures and apply them to dynamic topic modeling. The dependency arises via superposition, subsampling and point transition on the underlying Poisson processes of these measures. The measures used include normalised generalised Gamma processes that demonstrate power law properties, unlike Dirichlet processes used previously in dynamic topic modeling. Inference for the model includes adapting a recently developed slice sampler to directly manipulate the underlying Poisson process. Experiments performed on news, blogs, academic and Twitter collections demonstrate the technique gives superior perplexity over a number of previous models.
No abstract
No abstract
Purpose This study aims to analyse online customer experience in the hospitality industry through dynamic topic modelling (DTM) and net promoter score (NPS). A novel model that was used for collecting, pre-processing and analysing online reviews was proposed to understand the hidden information in the corpus and gain customer experience. Design/methodology/approach A corpus with 259,470 customer comments in English was collected. The researchers experimented and selected the best K parameter (number of topics) by perplexity and coherence score measurements as the input parameter for the model. Finally, the team experimented on the corpus using the Latent Dirichlet allocation (LDA) model and DTM with K coefficient to explore latent topics and trends of topics in the corpus over time. Findings The results of the topic model show hidden topics with the top high-probability keywords that are concerned with customers and the trends of topics over time. In addition, this study also calculated and analysed the NPS from customer rating scores and presented it on an overview dashboard. Research limitations/implications The data used in the experiment are only a part of all user comments; therefore, it may not reflect all of the current customer experience. Practical implications The management and business development of companies in the hotel industry can also benefit from the empirical findings from the topic model and NPS analytics, which will support decision-making to help businesses improve products and services, increase existing customer satisfaction and draw in new customers. Originality/value This study differs from previous works in that it attempts to fill a gap in research focused on online customer experience in the hospitality industry and uses text analytics and NPS to reach this goal.
In this paper the authors build on prior literature to develop an adaptive and time-varying metadata-enabled dynamic topic model (mDTM) and apply it to a large Weibo dataset using an online Gibbs sampler for parameter estimation. Their approach simultaneously captures the maximum number of inherent dynamic features of microblogs thereby setting it apart from other online document mining methods in the extant literature. In summary, the authors' results show a better performance of mDTM in terms of the quality of the mined information compared to prior research and showcases mDTM as a promising tool for the effective mining of microblogs in a rapidly changing global information space.
Topic models have proven to be a useful tool for discovering latent structures in document collections. However, most document collections often come as temporal streams and thus several aspects of the latent structure such as the number of topics, the topics' distribution and popularity are time-evolving. Several models exist that model the evolution of some but not all of the above aspects. In this paper we introduce infinite dynamic topic models, iDTM, that can accommodate the evolution of all the aforementioned aspects. Our model assumes that documents are organized into epochs, where the documents within each epoch are exchangeable but the order between the documents is maintained across epochs. iDTM allows for unbounded number of topics: topics can die or be born at any epoch, and the representation of each topic can evolve according to a Markovian dynamics. We use iDTM to analyze the birth and evolution of topics in the NIPS community and evaluated the efficacy of our model on both simulated and real datasets with favorable outcome.
No abstract
No abstract
No abstract
Research on learning analytics and educational data mining has been published since the first conference on Educational Data Mining (EDM) in 2008 and gained momentum through the establishment of the Learning Analytics and Knowledge (LAK) conference in 2011. This paper addresses the LAK Data Challenge from the perspective of visual analytics of topic dynamics in the LAK Dataset between 2008 and 2012. The data set was processed using probabilistic, dynamic topic mining algorithms. To enable exploration and visual analysis of the resulting topic model by LAK researchers and stakeholders we developed and deployed D-VITA, a web-based browsing tool for dynamic topic models. In this paper we explore answers to the questions about past, present, and future of LAK posed in the Data Challenge based on a topic model of all papers in the LAK Dataset. We also briefly describe how users can explore the LAK topic model on their own using D-VITA. 1. OBJECTIVES The LAK Data Challenge called for contributions to make sense of the field of learning analytics including its “roots, current state, and future trends, based on how its members report and debate their research ” 1. This paper tackles the challenge by presenting facts obtained from statistical analyses of the paper full texts included in the provided LAK Dataset [7]. The main contributions are as follows: 1. A dynamic topic model was computed using the approach presented in [3]. Using this dynamic topic model we explore in Section 4 three questions about the evolution of topics in the LAK Dataset to distill knowledge about past, present and future of LAK research. 2. In Section 5 we describe the visual analytics application D-VITA 2, which puts the toolkit to answer the
No abstract
Topic modeling is a machine learning technique that identifies latent topics in a text corpus. There are several existing tools that allow end-users to create and explore topic models using graphical user interfaces. In this paper, we present a visual analytics system for dynamic topic models that goes beyond the existing breed of tools. First, it decouples the Web-based user interface from the underlying data sets, enabling exploration of arbitrary text data sets in the Web browser. Second, it allows users to explore dynamic topic models, while existing tools are often limited to static topic models. Finally, it comes with a tool server in the backend that allows the design and execution of scientific workflows to build topic models from any data source. The system is demonstrated by building and exploring a dynamic topic model of CIKM proceedings published since 2001.
Understanding how topics in scientific literature evolve is an interesting and important problem. Previous work simply models each paper as a bag of words and also considers the impact of authors. However, the impact of one document on another as captured by citations, one important inherent element in scientific literature, has not been considered. In this paper, we address the problem of understanding topic evolution by leveraging citations, and develop citation-aware approaches. We propose an iterative topic evolution learning framework by adapting the Latent Dirichlet Allocation model to the citation network and develop a novel inheritance topic model. We evaluate the effectiveness and efficiency of our approaches and compare with the state of the art approaches on a large collection of more than 650,000 research papers in the last 16 years and the citation network enabled by CiteSeerX. The results clearly show that citations can help to understand topic evolution better.
We propose a method for discovering the dependency relationships between the topics of documents shared in social networks using the latent social interactions, attempting to answer the question: given a seemingly new topic, from where does this topic evolve? In particular, we seek to discover the pair-wise probabilistic dependency in topics of documents which associate social actors from a latent social network, where these documents are being shared. By viewing the evolution of topics as a Markov chain, we estimate a Markov transition matrix of topics by leveraging social interactions and topic semantics. Metastable states in a Markov chain are applied to the clustering of topics. Applied to the CiteSeer dataset, a collection of documents in academia, we show the trends of research topics, how research topics are related and which are stable. We also show how certain social actors, authors, impact these topics and propose new ways for evaluating author impact.
No abstract
Document collections evolve over time, new topics emerge and old ones decline. At the same time, the terminology evolves as well. Much literature is devoted to topic evolution in nite document sequences assuming a xed vocabulary. In this study, we propose \\Topic Monitor " for the monitoring and understanding of topic and vocabulary evolution over an in nite document sequence, i.e. a stream. We use Probabilistic Latent Semantic Analysis (PLSA) for topic modeling and propose new folding-in techniques for topic adaptation under an evolving vocabulary. We extract a series of models, on which we detect index-based topic threads as human-interpretable descriptions of topic evolution. 1
No abstract
No abstract
No abstract
Topic discovery and evolution (TDE) has been a problem which has gained long standing interest in the research community. The goal in topic discovery is to identify groups of keywords from large corpora so that the information in those corpora are summarized succinctly. The nature of text corpora has changed dramatically in the past few years with the advent of social media. Social media services allow users to constantly share, follow and comment on posts from other users. Hence, such services have given a new dimension to the traditional text corpus. The new dimension being that today's corpora have a social context embedded in them in terms of the community of users interested in a particular post, their profiles etc. We wish to harness this social context that comes along with the textual content for TDE. In particular, our goal is to both qualitatively and quantitatively analyze when social context actually helps with TDE. Methodologically, we approach the problem of TDE by a proposing non-negative matrix factorization (NMF) based model that incorporates both the textual information and social context information. We perform experiments on large scale real world dataset of news articles, and use Twitter as the platform providing information about the social context of these news articles. We compare with and outperform several state-of-the-art baselines. Our conclusion is that using the social context information is most useful when faced with topics that are particularly difficult to detect.
No abstract
This study explores how the concept and research on the water-energy-food (WEF) nexus has evolved over time. The research uncovers the key terms underpinning the phenomenon, maps the interlinkages between WEF nexus topics, and provides an overview of the evolution of the concept of WEF nexus. We analyzed published academic literature from the Scopus database and performed both qualitative and quantitative analyses using Natural Language Processing method. The findings suggest that the nexus approach is increasingly evolving into an integrative concept, and has been incorporating new topics over time, resulting in different methods for WEF nexus research, with a focus on interdisciplinary and inter-sectoral analyses. Through the five periods outlined, we have identified the nexus approach debate focused on the following predominant topics: i) Trend 1 (2012–2016) debates on WEF nexus for water management and natural resource security, ii) Trend 2 (2017–2018) linkages between the nexus, the sustainable development goals and green economy, iii) Trend 3 (2019) WEF nexus governance and policy integration, iv) Trend 4 (2020) application of the nexus concept on different scales, including regions, countries, watersheds, urban areas as well as other components coupled to the WEF nexus, and, v) Trend 5 (2021) climate change and urban nexus challenges.
This paper analyses topic segmentation based on the LDA (Latent Dirichlet Allocation) model, and performs the topic segmentation and topic evolution of stem cell research literatures in PubMed from 2001 to 2012 by combining the HMM (Hidden Markov Model) and co-occurrence theory. Stem cell research topics were obtained with LDA and expert judgements made on these topics to test the feasibility of the model classification. Further, the correlation between topics was analysed. HMM was used to predict the trend evolution of topics over various years, and a time series map was used to visualize the evolutional relationships among the stem cell topics.
This paper introduces WordStream, an interactive visual tool for the demonstration of topic evolution. Our approach utilizes the two popular techniques. Word clouds are designed to give an engaging visualization of text via font sizes and colors, while stacked graphs are a common method for visualizing topic evolution. In particular, WordStream emphasizes essential terms chronologically and spatially. To show the usefulness of WordStream, we demonstrate its applications on various data sets, including the Huffington Post and IEEE VIS publications.
No abstract
No abstract
No abstract
Abstract Purpose We present an analytical, open source and flexible natural language processing and text mining method for topic evolution, emerging topic detection and research trend forecasting for all kinds of data-tagged text. Design/methodology/approach We make full use of the functions provided by the open source VOSviewer and Microsoft Office, including a thesaurus for data clean-up and a LOOKUP function for comparative analysis. Findings Through application and verification in the domain of perovskite solar cells research, this method proves to be effective. Research limitations A certain amount of manual data processing and a specific research domain background are required for better, more illustrative analysis results. Adequate time for analysis is also necessary. Practical implications We try to set up an easy, useful, and flexible interdisciplinary text analyzing procedure for researchers, especially those without solid computer programming skills or who cannot easily access complex software. This procedure can also serve as a wonderful example for teaching information literacy. Originality/value This text analysis approach has not been reported before.
Abstract Research on topic evolution of Microblog is an effective way to analyze network public opinions. This paper proposes a method for mining changing of Microblog topics with time, and realizes topic evolution through topic extraction and topic relevance calculation. Firstly, latent Dirichlet allocation (LDA) model is used to automatically extract topics from different time slices; secondly, a similarity calculation algorithm is designed to calculate relevance of topic content through normalization of similarities among characteristic words and co-occurrence relations, to get evolutionary relationship among sub-topics of different time slices; thirdly, using probability distribution of blog article-topic to calculate topic intensity in each time slice, and then gets evolutionary relationship of topic intensity over time. Experiments show that the proposed topic evolution analysis model can effectively detect the evolution of topic content and intensity of real blogs.
No abstract
Social media short texts like tweets and instant messages provide a lot of valuable information about the hot topics and public opinion. Detecting and tracking topics from these online contents can help people grasp the essential information and its evolution and facilitate many applications. Topic evolution models built based on LDA need to set the topic number manually, which could not change during different time periods and could not be adjusted based on the contents. The nonparametric topic evolution models do not perform very well on short texts due to the data sparsity problem. So in this paper, we propose a nonparametric topic evolution model for short texts. The model uses the recurrent Chinese restaurant process as the prior distribution of topic proportions. Combining it with word co-occurrence modeling, we construct a topic evolution model which is suitable for social media short texts. We carry out experimental studies on twitter dataset. The results show that our method outperforms the baseline methods and could monitor the topic evolution in social media short texts effectively.
No abstract
No abstract
This study applied LDA (latent D irichlet allocation) and regression analysis to conduct a lead‐lag analysis to identify different topic evolution patterns between preprints and papers from arXiv and the W eb of S cience ( WoS ) in astrophysics over the last 20 years (1992–2011). Fifty topics in arXiv and WoS were generated using an LDA algorithm and then regression models were used to explain 4 types of topic growth patterns. Based on the slopes of the fitted equation curves, the paper redefines the topic trends and popularity. Results show that arXiv and WoS share similar topics in a given domain, but differ in evolution trends. Topics in WoS lose their popularity much earlier and their durations of popularity are shorter than those in arXiv . This work demonstrates that open access preprints have stronger growth tendency as compared to traditional printed publications.
This paper investigates the collaboration patterns and research topic trends in the publications of the IEEE Transactions on Intelligent Transportation Systems (T-ITS) over the past decade. We find that coauthorship is prevalent and that the coauthorship networks possess the scale-free property on high degree nodes. Collaborations usually occur within the same research institutions and countries. Interorganization/region collaboration structures are usually connected through a few productive/high-impact authors. Typical international collaborations are between the U.S. and other countries such as China, Germany, U.K., and Italy. Active topics studied in IEEE T-ITS publications in the past ten years include traffic management and machine vision, among others. Authors can be partitioned into common interest groups, of which machine vision and automatic vehicle control attract more researchers.
No abstract
As the advance of mobile technologies, geographical records can be easily embedded in the data to form the location-associated documents. For example, in Twitter, the location of tweets can be identified by the GPS locations or IP addresses from smart phones. In Flickr, photos may be tagged and recorded with GPS locations. With the geographical information, it is more likely to model users' interests in different regions so as to determine the corresponding marketing strategy. Due to its potential in providing personalized and context-aware services, several pieces of work have started to explore in this area. One stream of work tries to discover users' interest topics from location-associated documents. These models work under the assumption that words close in geographical positions are likely to be clustered into the same geographical topic. However, they attain this in a static mode. That is, they do not consider the evolution of the topics. In addition, they have to specify the total number of topics for the corpus in advance. In order to utilize the geographical information and to model the change of topics, we propose a location-based topic evolution (LBTE) model to tackle the above issues. Main advantages of our model lie that it can reveal the appearance and disappearance of the topics in different regions. Moreover, topics can be automatically determined based on the location-associated documents and its total number is not restricted to a preset value. Finally, we conduct a series of experiments on both synthetic and real-world datasets to demonstrate the merits of our proposed LBTE model in capturing users' interest topics.
No abstract
For companies acting on a global scale, the necessity to monitor and analyze news channels and consumer-generated media on the Web, such as weblogs and n news-groups, is steadily increasing. In particular the identification of novel trends and upcoming issues, as well as their dynamic evolution over time, is of utter importance to corporate communications and market analysts. Automated machine learning systems using clustering techniques have only partially succeeded in addressing these newly arising requirements, failing in their endeavor to properly assign short-term hype topics to long-term trends. We propose an approach which allows to monitor news wire on different levels of temporal granularity, extracting key-phrases that reflect short-term topics as well as longer-term trends by means of statistical language modelling. Moreover, our approach allows for assigning those windows of smaller scope to those of longer intervals.
This paper presents an LDA-style topic model that captures not only the low-dimensional structure of data, but also how the structure changes over time. Unlike other recent work that relies on Markov assumptions or discretization of time, here each topic is associated with a continuous distribution over timestamps, and for each generated document, the mixture distribution over topics is influenced by both word co-occurrences and the document's timestamp. Thus, the meaning of a particular topic can be relied upon as constant, but the topics' occurrence and correlations change significantly over time. We present results on nine months of personal email, 17 years of NIPS research papers and over 200 years of presidential state-of-the-union addresses, showing improved topics, better timestamp prediction, and interpretable trends.
This paper presents an LDA-style topic model that captures not only the low-dimensional structure of data, but also how the structure changes over time. Unlike other recent work that relies on Markov assumptions or discretization of time, here each topic is associated with a continuous distribution over timestamps, and for each generated document, the mixture distribution over topics is influenced by both word co-occurrences and the document's timestamp. Thus, the meaning of a particular topic can be relied upon as constant, but the topics' occurrence and correlations change significantly over time. We present results on nine months of personal email, 17 years of NIPS research papers and over 200 years of presidential state-of-the-union addresses, showing improved topics, better timestamp prediction, and interpretable trends.
No abstract
In this paper we present a new approach to the analysis of topics and their dynamics over time. Given a large amount of news text on a daily basis, we have identified “hotly discussed” concepts by examining the contextual shift between the time slices. We adopt the volatility measure from econometrics and propose a new algorithm for frequency-independent detection of topic drift.
No abstract
No abstract
A single, stationary topic model such as latent Dirichlet allocation is inappropriate for modeling corpora that span long time periods, as the popularity of topics is likely to change over time. A number of models that incorporate time have been proposed, but in general they either exhibit limited forms of temporal variation, or require computationally expensive inference methods. In this paper we propose nonparametric Topics over Time (npTOT), a model for time-varying topics that allows an unbounded number of topics and flexible distribution over the temporal variations in those topics’ popularity. We develop a collapsed Gibbs sampler for the proposed model and compare against existing models on synthetic and real document sets.
No abstract
Climate change remains a primary threat to inland fishes and fisheries. Using topic modeling to examine trends and relationships across 36 years of scientific literature on documented and projected climate impacts to inland fish, we identify ten representative topics within this body of literature: assemblages, climate scenarios, distribution, climate drivers, population growth, invasive species, populations, phenology, physiology, and reproduction. These topics are largely similar to the output from artificial intelligence application (i.e., ChatGPT) search prompts, but with some key differences. The field of climate impacts on fish has seen dramatic growth since the mid-2000s with increasing popularity of topics related to drivers, assemblages, and phenology. The topics were generally well-dispersed with little overlap of common words, apart from phenology and reproduction which were closely clustered. Pairwise comparisons between topics revealed potential gaps in the literature including between reproduction and distribution and between physiology and phenology. A better understanding of these relationships can help capitalize on existing literature to inform conservation and sustainable management of inland fishes with a changing climate.
The Topics over Time (TOT) model allows users to be aware of changes in certain topics over time. The proposed method inputs the divided dataset of security blog posts based on a fixed period using an overlap period to the TOT. The results suggest the extraction of topics that include malware and attack campaign names that are appropriate for the multi-labeling of cyber threat intelligence reports.
DRAFT – Please do not cite without permission. In this paper, we describe a method for statistical learning from speech documents that we apply to the Congressional Record in order to gain new insight into the dynamics of the political agenda. Prior efforts to evaluate the attention of elected representatives across topic areas have been expensive manual coding exercises and are generally circumscribed along one or more features of detail: limited time periods, high levels of temporal aggregation, coarse topical categories, and so on. Conversely, the Congressional Record has scarcely been used for such analyses, largely because it contains too much information to absorb. We describe here a method for inferring, through the patterns of word choice in each speech and the dynamics of word choice patterns across time, (a) what the topics of speeches are, and (b) the probability that attention will be paid to any given topic or set of topics over time. We use the model to examine the agenda in the United States Senate from 1997-2004, a database of over 70 thousand documents containing over 70 million words. We estimate the model for 42 topics and provide evidence that we can reveal speech topics that are both distinctive and inter-related in substantively meaningful ways. We demonstrate further that the dynamics our model gives us leverage into important questions about the dynamics of the political agenda.
International audience
How can the development of ideas in a scientific field be studied over time? We apply unsupervised topic modeling to the ACL Anthology to analyze historical trends in the field of Computational Linguistics from 1978 to 2006. We induce topic clusters using Latent Dirichlet Allocation, and examine the strength of each topic over time. Our methods find trends in the field including the rise of probabilistic methods starting in 1988, a steady increase in applications, and a sharp decline of research in semantics and understanding between 1978 and 2001, possibly rising again after 2001. We also introduce a model of the diversity of ideas, topic entropy, using it to show that COLING is a more diverse conference than ACL, but that both conferences as well as EMNLP are becoming broader over time. Finally, we apply Jensen-Shannon divergence of topic distributions to show that all three conferences are converging in the topics they cover.
This paper presents Online Topic Model (OLDA), a topic model that automatically captures the thematic patterns and identifies emerging topics of text streams and their changes over time. Our approach allows the topic modeling framework, specifically the Latent Dirichlet Allocation (LDA) model, to work in an online fashion such that it incrementally builds an up-to-date model (mixture of topics per document and mixture of words per topic) when a new document (or a set of documents) appears. A solution based on the Empirical Bayes method is proposed. The idea is to incrementally update the current model according to the information inferred from the new stream of data with no need to access previous data. The dynamics of the proposed approach also provide an efficient mean to track the topics over time and detect the emerging topics in real time. Our method is evaluated both qualitatively and quantitatively using benchmark datasets. In our experiments, the OLDA has discovered interesting patterns by just analyzing a fraction of data at a time. Our tests also prove the ability of OLDA to align the topics across the epochs with which the evolution of the topics over time is captured. The OLDA is also comparable to, and sometimes better than, the original LDA in predicting the likelihood of unseen documents.
The words we use to talk about the current epidemiological crisis on social media can inform us on how we are conceptualizing the pandemic and how we are reacting to its development. This paper provides an extensive explorative analysis of how the discourse about Covid-19 reported on Twitter changes through time, focusing on the first wave of this pandemic. Based on an extensive corpus of tweets (produced between 20th March and 1st July 2020) first we show how the topics associated with the development of the pandemic changed through time, using topic modeling. Second, we show how the sentiment polarity of the language used in the tweets changed from a relatively positive valence during the first lockdown, toward a more negative valence in correspondence with the reopening. Third we show how the average subjectivity of the tweets increased linearly and fourth, how the popular and frequently used figurative frame of WAR changed when real riots and fights entered the discourse.
Directed links in social media could represent anything from intimate friendships to common interests, or even a passion for breaking news or celebrity gossip. Such directed links determine the flow of information and hence indicate a user's influence on others — a concept that is crucial in sociology and viral marketing. In this paper, using a large amount of data collected from Twitter, we present an in-depth comparison of three measures of influence: indegree, retweets, and mentions. Based on these measures, we investigate the dynamics of user influence across topics and time. We make several interesting observations. First, popular users who have high indegree are not necessarily influential in terms of spawning retweets or mentions. Second, most influential users can hold significant influence over a variety of topics. Third, influence is not gained spontaneously or accidentally, but through concerted effort such as limiting tweets to a single topic. We believe that these findings provide new insights for viral marketing and suggest that topological measures such as indegree alone reveals very little about the influence of a user.
Abstract The Engineering Design field is growing fast and so is growing the number of sub-fields that are bringing value to researchers that are working in this context. From psychology to neurosciences, from mathematics to machine learning, everyday scholars and practitioners produce new knowledge of potential interest for designers. This leads to complications in the researchers’ aims who want to quickly and easily find literature on a specific topic among a large number of scientific publications or want to effectively position a new research. In the present paper, we address this problem by using state of the art text mining techniques on a large corpus of Engineering Design related documents. In particular, a topic modelling technique is applied to all the papers published in the ICED proceedings from 2003 to 2017 (3,129 documents) in order to find the main subtopics of Engineering Design. Finally, we analyzed the trends of these topics over time, to give a bird-eye view of how the Engineering Design field is evolving. The results offer a clear and bottom-up picture of what Engineering design is and how the interest of researchers in different topics has changed over time.
Tracking new topics, ideas, and "memes" across the Web has been an issue of considerable interest. Recent work has developed methods for tracking topic shifts over long time scales, as well as abrupt spikes in the appearance of particular named entities. However, these approaches are less well suited to the identification of content that spreads widely and then fades over time scales on the order of days - the time scale at which we perceive news and events.
No abstract
Online education provides data from students solving problems at dierent levels of prociency over time. Unfortunately, methods that use these data for inferring student knowledge rely on costly domain expertise. We propose three novel data-driven methods that bridge sequence modeling with topic models to infer students’ time varying knowledge. These methods dier in complexity,
No AccessPolicy Research Working Papers22 Jun 2013Industry And Skill Wage Premiums In East AsiaAuthors/Editors: Emanuela di Gropello and Chris SakellariouEmanuela di Gropello and Chris Sakellariouhttps://doi.org/10.1596/1813-9450-5379SectionsAboutPDF (0.7 MB) ToolsAdd to favoritesDownload CitationsTrack Citations ShareFacebookTwitterLinked In Abstract:This paper focuses on the estimation of skill/industry premiums and labor force composition at the national and sector levels in seven East Asian countries with the objective of providing a comprehensive analysis of trends in demand for skills in the region. The paper addresses the following questions: Are there converging or diverging trends in the region regarding the evolution of skill premiums and labor force composition? Are changes in skill premiums generalized or industry-related? How have industry premiums evolved? The analysis uses labor and household surveys going back at least 10 years. The main trends emerging from the analysis are: (a) increasing proportions of skilled/educated workers over the long run across the region; (b) generally increasing demand for skills in the region; (c) the service sector has become the most important driver of demand for skills for all countries (except Thailand); (d) countries can be broadly categorized into three groups in relation to trends and patterns of demand for skills (Indonesia, Philippines, and Thailand; Vietnam and China; and Cambodia and Mongolia); and (e) industry premiums have increased in three countries of the region (Philippines, Thailand, and Cambodia). These trends point to several policy implications, including that governments should focus on policies promoting access to education to address the increasing demand for skills and/or persistent skill shortages; support general rather than specific curricula given broad-based increases in skill premiums in most countries; better tailor curriculum design and content and pedagogical approaches to the needs of the service sector; and target some social protection programs to unskilled workers to protect them from the "unequalizing" impact of education. Previous bookNext book FiguresreferencesRecommendeddetailsCited byAchieving inclusive growth? Wage dynamics in Cambodia, Laos and VietnamDevelopment Studies Research, Vol.9, No.119 September 2022Income Inequality and Labour1 January 2023Sharing the Growth Dividend: Analysis of Inequality in AsiaJournal of Banking and Financial Economics, Vol.2, No.201920 December 2019Income Inequality in Developing Countries, Past and Present13 August 2019Structural Change and the Skill Premium in a Global EconomySSRN Electronic JournalEducation and Transition to Work: Evidence From Vietnam, Cambodia and NepalSSRN Electronic JournalStrategic Alignment of Tertiary Education and Economies in East and Southeast AsiaInternational Journal of Chinese Education, Vol.5, No.21 February 2017Catching-up, structural transformation, and inequality: industry-level evidence from AsiaIndustrial and Corporate Change, Vol.2128 September 2016Catching Up, Structural Transformation, and Inequality: Lessons from AsiaSSRN Electronic JournalStructural Change and Wage Inequality in the Manufacturing Sector: Long Run Evidence from East AsiaOxford Development Studies, Vol.43, No.215 April 2015Technological Change, Skill Demand, and Wage Inequality: Evidence from IndonesiaWorld Development, Vol.67Rural-Urban Migration and the Skill Wage Premium in Brazil: 1980-2000SSRN Electronic JournalTechnological Change, Skill Demand, and Wage Inequality in IndonesiaSSRN Electronic Journal View Published: August 2010 Copyright & Permissions Related Regions East Asia & Pacific South Asia Related Topics Education Social Protections and Labor Water Resources KeywordsDRIVERSHOUSEHOLD SURVEYSINCOME INEQUALITIESJOBSLABOR DEMANDLABOR FORCELABOR MARKETLABOR MARKET SEGMENTATIONLABOR MARKETSLABOR REALLOCATIONPRODUCTIVITY GROWTHSERVICE SECTORSKILL PREMIUMSSKILL SHORTAGESSKILL UPGRADINGSKILL-BIASED TECHNOLOGIESSKILLED LABORUNSKILLED WORKERSWAGE PREMIUMSWORKERS PDF DownloadLoading ...
本报告将主题演化研究划分为五个维度:底层模型算法的理论优化(DTM、在线学习与语义增强)、科学文献计量的实证研究(学科趋势与知识图谱)、社会舆情与公共政策的动态监测(社交媒体与政务文本)、垂直行业的商业应用(金融、软件、医疗等场景),以及多维数据关联与交互式可视化分析(系统开发、地理空间与用户个性化)。整体研究趋势呈现出从单一文本主题向多模态、时空多维融合演进,从通用模型向领域适配与人机交互可视化系统跨越的特征。