A Critical Discourse Analysis of News Reports within Fairclough's Three-Dimensional Model
新闻话语证据:发布后标题改写与媒体框架/偏见标注资源
聚焦可复用的数据资源与标注方案:研究新闻在发布后标题被编辑/改写的现象,并围绕该资源进行偏见类型标注、分类与下游框架/参与度分析,从而为CDA中“话语如何组织经验、建构立场”的文本证据提供可量化起点。
- The MediaSpin Dataset: Post-Publication News Headline Edits Annotated for Media Bias(Preetika Verma, Kokil Jaidka, 2024, ArXiv Preprint)
新闻语篇计算识别:行动者/主体指称(代词→规范名)
围绕新闻报道中的“话语网络”关键环节:从代词指称到规范主体名的行动者识别,解决称谓/指代链构建难题,并对传统流水线与LLM/混合模型进行评估,为CDA可操作的“谁在说/谁被塑造为行动主体”的文本层证据提供方法。
- Actor Identification in Discourse: A Challenge for LLMs?(Ana Barić, Sean Papay, Sebastian Padó, 2024, ArXiv Preprint)
话语结构与修辞组织的计算表征:话语树/RS T用于下游评测(以MT为例)
将话语结构(如Rhetorical Structure Theory)编码为可计算的树结构表征,并用于自动评测任务(例如机器翻译一致性评测),强调“语篇组织/修辞关系”作为核心特征进行建模,以支撑CDA文本维度对衔接、层次关系与隐含立场机制的可证据化分析。
- DiscoTK: Using Discourse Structure for Machine Translation Evaluation(Shafiq Joty, Francisco Guzman, Lluis Marquez, Preslav Nakov, 2019, ArXiv Preprint)
话语标记与话语关系信号的计算建模:连接/隐式关系/主题分段/锚定
共同点是以“话语标记—话语关系—话语组织信号”为抓手:既考察话语连接词对计算理解的影响,也进行话语标记识别与对话行为预测;同时通过神经模型学习隐式话语关系、利用上句语境/依存信息改进主题分段,并扩展到多体裁/锚定方案(不仅限WSJ新闻)。这些成果共同服务于CDA“文本如何通过衔接/关系类型实现意识形态运作与立场组织”的可计算证据链。
- When Do Discourse Markers Affect Computational Sentence Understanding?(Ruiqi Li, Liesbeth Allein, Damien Sileo, Marie-Francine Moens, 2023, ArXiv Preprint)
- Identifying Discourse Markers in Spoken Dialog(Peter A. Heeman, Donna Byron, James F. Allen, 1998, ArXiv Preprint)
- A Recurrent Neural Model with Attention for the Recognition of Chinese Implicit Discourse Relations(Samuel Rönnqvist, Niko Schenk, Christian Chiarcos, 2017, ArXiv Preprint)
- Improving Topic Segmentation by Injecting Discourse Dependencies(Linzi Xing, Patrick Huber, Giuseppe Carenini, 2022, ArXiv Preprint)
- Beyond The Wall Street Journal: Anchoring and Comparing Discourse Signals across Genres(Yang Liu, 2019, ArXiv Preprint)
- On the Role of Context for Discourse Relation Classification in Scientific Writing(Stephen Wan, Wei Liu, Michael Strube, 2025, ArXiv Preprint)
- Discourse Structure in Machine Translation Evaluation(Shafiq Joty, Francisco Guzmán, Lluís Màrquez, Preslav Nakov, 2017, ArXiv Preprint)
- The distribution of discourse relations within and across turns in spontaneous conversation(S. Magalí López Cortez, Cassandra L. Jacobs, 2023, ArXiv Preprint)
- A Neural Approach to Discourse Relation Signal Detection(Amir Zeldes, Yang Liu, 2020, ArXiv Preprint)
- ChatGPT Evaluation on Sentence Level Relations: A Focus on Temporal, Causal, and Discourse Relations(Chunkit Chan, Jiayang Cheng, Weiqi Wang, Yuxin Jiang, Tianqing Fang, Xin Liu, Yangqiu Song, 2023, ArXiv Preprint)
- GumDrop at the DISRPT2019 Shared Task: A Model Stacking Approach to Discourse Unit Segmentation and Connective Detection(Yue Yu, Yilun Zhu, Yang Liu, Yan Liu, Siyao Peng, Mackenzie Gong, Amir Zeldes, 2019, ArXiv Preprint)
- Embedding Mental Health Discourse for Community Recommendation(Hy Dang, Bang Nguyen, Noah Ziems, Meng Jiang, 2023, ArXiv Preprint)
- A Joint Model of Conversational Discourse and Latent Topics on Microblogs(Jing Li, Yan Song, Zhongyu Wei, Kam-Fai Wong, 2018, ArXiv Preprint)
- Anchoring a Lexicalized Tree-Adjoining Grammar for Discourse(Bonnie Lynn Webber, Aravind K. Joshi, 1998, ArXiv Preprint)
- Anchoring a Lexicalized Tree-Adjoining Grammar for Discourse(Bonnie Lynn Webber, Aravind K. Joshi, 1998, ArXiv Preprint)
语言形式化与语法/解析框架的理论与工具基础(符号层表征能力)
该组提供CDA所需的“可计算文本结构描述能力”的基础:涵盖约束语法/范畴语法等形式语言理论、构式语法与AI关联、语法开发工程经验,以及语言识别的鲁棒评测;并补充解析环境与形式系统,使得把话语线索形式化为结构/接口表征成为可能。
- The Power of Constraint Grammars Revisited(Anssi Yli-Jyrä, 2017, ArXiv Preprint)
- Construction Grammar and Artificial Intelligence(Katrien Beuls, Paul Van Eecke, 2023, ArXiv Preprint)
- Experiences with the GTU grammar development environment(Martin Volk, Dirk Richarz, 1997, ArXiv Preprint)
- Robust Language Identification for Romansh Varieties(Charlotte Model, Sina Ahmadi, Jannis Vamvas, 2026, ArXiv Preprint)
- Construction Grammar and Language Models(Harish Tayyar Madabushi, Laurence Romain, Petar Milin, Dagmar Divjak, 2023, ArXiv Preprint)
- Minimalist Grammars and Minimalist Categorial Grammars, definitions toward inclusion of generated languages(Maxime Amblard, 2011, ArXiv Preprint)
- TuLiPA: Towards a Multi-Formalism Parsing Environment for Grammar Engineering(Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier, Johannes Dellert, Kilian Evang, 2008, ArXiv Preprint)
- Classical linear logic, cobordisms and categorical semantics of categorial grammars(Sergey Slavnov, 2018, ArXiv Preprint)
- Evaluation of a Grammar of French Determiners(Eric Laporte, 2007, ArXiv Preprint)
- Order preserving and order reversing operators on the class of convex functions in Banach spaces(Alfredo N. Iusem, Daniel Reem, Benar F. Svaiter, 2012, ArXiv Preprint)
- Functions on Symmetric Spaces and Oscillator Representations(Hongyu He, 2006, ArXiv Preprint)
社交媒体政治化/操纵/群体话语动态:协调检测、主题转移、道德框架与图探索
面向社交媒体的政治/意识形态/操纵与情绪相关现象:通过协调行为检测、主题转移度量政治化过程、结合BERTopic与道德基础理论追踪政治话题演化与道德框架关联,并用图探索发现未知网络中的隐藏节点/影响者;同时以多任务学习从早期文本识别压力/抑郁等状态。核心在于把话语内容与群体互动/网络结构过程耦合并量化。
- Coordinated Behavior on Social Media in 2019 UK General Election(Leonardo Nizzoli, Serena Tardelli, Marco Avvenuti, Stefano Cresci, Maurizio Tesconi, 2020, ArXiv Preprint)
- Topic Shifts as a Proxy for Assessing Politicization in Social Media(Marcelo Sartori Locatelli, Pedro Calais, Matheus Prado Miranda, João Pedro Junho, Tomas Lacerda Muniz, Wagner Meira, Virgilio Almeida, 2023, ArXiv Preprint)
- Modeling Political Discourse with Sentence-BERT and BERTopic(Margarida Mendonca, Alvaro Figueira, 2025, ArXiv Preprint)
- Exploring Unknown Social Networks for Discovering Hidden Nodes(Sho Tsugawa, Hiroyuki Ohsaki, 2025, ArXiv Preprint)
- Multitask learning for recognizing stress and depression in social media(Loukas Ilias, Dimitris Askounis, 2023, ArXiv Preprint)
- Label-dependent Feature Extraction in Social Networks for Node Classification(Tomasz Kajdanowicz, Przemyslaw Kazienko, Piotr Doskocz, 2013, ArXiv Preprint)
新闻/信息推荐与影响图:社交网络多层关系建模与个性化传播推断
以社交网络的多层关系为核心实现信息传播/关系生成:一方面显式建模扩散与影响图用于微博新闻推荐与时变偏好刻画,另一方面在多维网络框架中区分社会联系与语义(对象)联系,通过层权重适配生成推荐/建议。为CDA提供“媒体内容如何在网络实践中被选择与扩散”的计算外部语境。
- IGNiteR: News Recommendation in Microblogging Applications (Extended Version)(Yuting Feng, Bogdan Cautis, 2022, ArXiv Preprint)
- Multidimensional Social Network in the Social Recommender System(Przemyslaw Kazienko, Katarzyna Musial, Tomasz Kajdanowicz, 2013, ArXiv Preprint)
社交媒体失真信息/宣传:检测、表征与扩散动力学建模
关注社交媒体中的虚假信息/宣传/操纵:通过传播网络建模、检测/聚类流程与动力学评估刻画行动者—内容—扩散的可计算证据链,并提供机制层面的外部约束,帮助CDA将“话语再生产的权力/制度作用方式”接到可追踪的传播过程上。
- Proppy: A System to Unmask Propaganda in Online News(Alberto Barrón-Cedeño, Giovanni Da San Martino, Israa Jaradat, Preslav Nakov, 2019, ArXiv Preprint)
- Keeping it Authentic: The Social Footprint of the Trolls Network(Ori Swed, Sachith Dassanayaka, Dimitri Volchenkov, 2024, ArXiv Preprint)
- Measuring social spam and the effect of bots on information diffusion in social media(Emilio Ferrara, 2017, ArXiv Preprint)
- Clustering Memes in Social Media(Emilio Ferrara, Mohsen JafariAsbagh, Onur Varol, Vahed Qazvinian, Filippo Menczer, Alessandro Flammini, 2013, ArXiv Preprint)
- Quantifying Social Network Dynamics(Radosław Michalski, Piotr Bródka, Przemysław Kazienko, Krzysztof Juszczyszyn, 2013, ArXiv Preprint)
- Assessing Individual and Community Vulnerability to Fake News in Social Networks(Bhavtosh Rath, Wei Gao, Jaideep Srivastava, 2021, ArXiv Preprint)
- Empirical Evaluation of Link Deletion Methods for Limiting Information Diffusion on Social Media(Shiori Furukawa, Sho Tsugawa, 2026, ArXiv Preprint)
新闻媒体事实性与偏差的预测:跨模态/跨语言联合建模与多来源证据整合
把“事实性(factuality)—偏差/意识形态(bias)”作为可预测对象,在媒体/机构层面构建评估框架:涵盖特征工程与机器学习建模、事实性与偏差联合预测、多来源信号整合(文章/维基/社媒/流量等),以及专门化LLM提升跨语言新闻理解与分析能力,从而为CDA的文本—意识形态解读提供可量化参照。
- Predicting Factuality of Reporting and Bias of News Media Sources(Ramy Baly, Georgi Karadzhov, Dimitar Alexandrov, James Glass, Preslav Nakov, 2018, ArXiv Preprint)
- What Was Written vs. Who Read It: News Media Profiling Using Text Analysis and Social Media Context(Ramy Baly, Georgi Karadzhov, Jisun An, Haewoon Kwak, Yoan Dinkov, Ahmed Ali, James Glass, Preslav Nakov, 2020, ArXiv Preprint)
- A Survey on Predicting the Factuality and the Bias of News Media(Preslav Nakov, Husrev Taha Sencar, Jisun An, Haewoon Kwak, 2021, ArXiv Preprint)
- Multi-Task Ordinal Regression for Jointly Predicting the Trustworthiness and the Leading Political Ideology of News Media(Ramy Baly, Georgi Karadzhov, Abdelrhman Saleh, James Glass, Preslav Nakov, 2019, ArXiv Preprint)
- LlamaLens: Specialized Multilingual LLM for Analyzing News and Social Media Content(Mohamed Bayan Kmainasi, Ali Ezzat Shahroor, Maram Hasanain, Sahinur Rahman Laskar, Naeemul Hassan, Firoj Alam, 2024, ArXiv Preprint)
社媒研究的可复现实验与工程基础设施:实时干预测量与可持续软件实践
以研究过程的可执行性、复现性与工程可持续为重点:包含面向社媒信息流的实时干预与测量(可用于减少研究偏差、检验因果效应),以及支持新架构适配与长期维护的软件实践。该组为CDA研究中的“证据采集、干预验证与复现质量控制”提供方法论支撑。
- Reranking Social Media Feeds: A Practical Guide for Field Experiments(Tiziano Piccardi, Martin Saveski, Chenyan Jia, Jeffrey Hancock, Jeanne L. Tsai, Michael S. Bernstein, 2024, ArXiv Preprint)
- Experiments in Sustainable Software Practices for Future Architectures(Charles R. Ferenbaugh, 2013, ArXiv Preprint)
可解释结构/贡献估计的对话树与因果归因(结构—作用—解释框架)
强调结构化的因果或贡献归因:一方面使用因果发现与结构因果模型估计影响渠道贡献(方向性、可解释);另一方面以对话树联合学习话语角色与主题。该组为CDA提供“话语/渠道如何产生影响”的可解释结构—作用建模补充。
- Causal-driven attribution (CDA): Estimating channel influence without user-level data(Georgios Filippou, Boi Mai Quach, Diana Lenghel, Arthur White, Ashish Kumar Jha, 2025, ArXiv Preprint)
- A Joint Model of Conversational Discourse and Latent Topics on Microblogs(Jing Li, Yan Song, Zhongyu Wei, Kam-Fai Wong, 2018, ArXiv Preprint)
非直接相关:跨学科数学/物理/天文/离散算法/3D工程等背景材料(不纳入新闻CDA主分析)
这些条目主要属于数学/物理/天文数值/3D工程/离散或算法理论等领域,未在所给初始化分组中形成对新闻话语分析的直接文本证据或计算话语机制主线。为避免与“新闻/话语/社媒话语”主干分组交叉,本组作为非直接相关背景并列收束(保留但不用于CDA核心证据合成)。
- Stochastic integration in Riemannian manifolds from a functional-analytic point of view(Alexandru Mustăţea, 2022, ArXiv Preprint)
- Infinite Dimensional Ito Algebras of Quantum White Noise(V. P. Belavkin, 2005, ArXiv Preprint)
- IVOA Recommendation: Data Model for Astronomical DataSet Characterisation(Mireille Louys, Anita Richards, Francois Bonnarel, Alberto Micol, Igor Chilingarian, Jonathan McDowell, the IVOA Data Model Working Group, 2011, ArXiv Preprint)
- Thetis coastal ocean model: discontinuous Galerkin discretization for the three-dimensional hydrostatic equations(Tuomas Kärnä, Stephan C. Kramer, Lawrence Mitchell, David A. Ham, Matthew D. Piggott, António M. Baptista, 2017, ArXiv Preprint)
- Maximum Lebesgue Extension of Monotone Convex Functions(Keita Owari, 2013, ArXiv Preprint)
- Point transitivity, $Δ$-transitivity and multi-minimality(Zhijing Chen, Jian Li, Jie Lü, 2013, ArXiv Preprint)
- Higher-Order Functions and Brouwer's Thesis(Jonathan Sterling, 2016, ArXiv Preprint)
- On three-dimensional Alexandrov spaces(Fernando Galaz-Garcia, Luis Guijarro, 2013, ArXiv Preprint)
- A lattice model with a theta term in three dimensions(Srinath Cheluvaraja, 1999, ArXiv Preprint)
- Description of three-dimensional evolution algebras(Yolanda Cabrera Casado, Mercedes Siles Molina, M. Victoria Velasco, 2017, ArXiv Preprint)
- A new tool in nuclear physics: Nuclear lattice simulations(Ulf-G. Meißner, 2015, ArXiv Preprint)
- Practical Reasoning in DatalogMTL(Dingmin Wang, Przemysław A. Wałęga, Pan Hu, Bernardo Cuenca Grau, 2024, ArXiv Preprint)
- Practical Trade-Offs for the Prefix-Sum Problem(Giulio Ermanno Pibiri, Rossano Venturini, 2020, ArXiv Preprint)
- A Three-Dimensional GUI for Windows Explorer(David Carter, Luiz Fernando Capretz, 2015, ArXiv Preprint)
- GraMMaR: Ground-aware Motion Model for 3D Human Motion Reconstruction(Sihan Ma, Qiong Cao, Hongwei Yi, Jing Zhang, Dacheng Tao, 2023, ArXiv Preprint)
- A Novel Corpus of Discourse Structure in Humans and Computers(Babak Hemmatian, Sheridan Feucht, Rachel Avram, Alexander Wey, Muskaan Garg, Kate Spitalnic, Carsten Eickhoff, Ellie Pavlick, Bjorn Sandstede, Steven Sloman, 2021, ArXiv Preprint)
- Embedding Mental Health Discourse for Community Recommendation(Hy Dang, Bang Nguyen, Noah Ziems, Meng Jiang, 2023, ArXiv Preprint)
合并后,这批文献可形成一套支持“Fairclough三维模型下新闻话语批评”的多层计算证据框架:在**社会实践层**,利用社交媒体网络与扩散动力学识别失真信息/宣传及其行动者机制,同时用政治化与操纵相关方法刻画群体话语动态;在**媒体/话语再生产层**,通过事实性与偏差预测(含跨语言/多来源信号)把“文本—立场”转化为可量化参照,并用可复现的社媒实验/工程基础设施保障证据质量;在**文本/语篇层**,引入行动者指称识别、话语结构树表征、以及话语标记与话语关系信号的自动识别与建模,为衔接、关系类型与隐含立场运作提供可操作的语言学证据。另有部分形式语言/语法与解析理论作为符号化文本结构表征的工具箱,以及少量跨学科非直接相关背景被并列剔除以避免分组交叉。
总计79篇相关文献
Recent research on discourse relations has found that they are cued not only by discourse markers (DMs) but also by other textual signals and that signaling information is indicative of genres. While several corpora exist with discourse relation signaling information such as the Penn Discourse Treebank (PDTB, Prasad et al. 2008) and the Rhetorical Structure Theory Signalling Corpus (RST-SC, Das and Taboada 2018), they both annotate the Wall Street Journal (WSJ) section of the Penn Treebank (PTB, Marcus et al. 1993), which is limited to the news domain. Thus, this paper adapts the signal identification and anchoring scheme (Liu and Zeldes, 2019) to three more genres, examines the distribution of signaling devices across relations and genres, and provides a taxonomy of indicative signals found in this dataset.
We present proppy, the first publicly available real-world, real-time propaganda detection system for online news, which aims at raising awareness, thus potentially limiting the impact of propaganda and helping fight disinformation. The system constantly monitors a number of news sources, deduplicates and clusters the news into events, and organizes the articles about an event on the basis of the likelihood that they contain propagandistic content. The system is trained on known propaganda sources using a variety of stylistic features. The evaluation results on a standard dataset show state-of-the-art results for propaganda detection.
In this article, we explore the potential of using sentence-level discourse structure for machine translation evaluation. We first design discourse-aware similarity measures, which use all-subtree kernels to compare discourse parse trees in accordance with the Rhetorical Structure Theory (RST). Then, we show that a simple linear combination with these measures can help improve various existing machine translation evaluation metrics regarding correlation with human judgments both at the segment- and at the system-level. This suggests that discourse information is complementary to the information used by many of the existing evaluation metrics, and thus it could be taken into account when developing richer evaluation metrics, such as the WMT-14 winning combined metric DiscoTKparty. We also provide a detailed analysis of the relevance of various discourse elements and relations from the RST parse trees for machine translation evaluation. In particular we show that: (i) all aspects of the RST tree are relevant, (ii) nuclearity is more useful than relation type, and (iii) the similarity of the translation RST tree to the reference tree is positively correlated with translation quality.
Conventional topic models are ineffective for topic extraction from microblog messages, because the data sparseness exhibited in short messages lacking structure and contexts results in poor message-level word co-occurrence patterns. To address this issue, we organize microblog messages as conversation trees based on their reposting and replying relations, and propose an unsupervised model that jointly learns word distributions to represent: 1) different roles of conversational discourse, 2) various latent topics in reflecting content information. By explicitly distinguishing the probabilities of messages with varying discourse roles in containing topical words, our model is able to discover clusters of discourse words that are indicative of topical content. In an automatic evaluation on large-scale microblog corpora, our joint model yields topics with better coherence scores than competitive topic models from previous studies. Qualitative analysis on model outputs indicates that our model induces meaningful representations for both discourse and topics. We further present an empirical study on microblog summarization based on the outputs of our joint model. The results show that the jointly modeled discourse and topic representations can effectively indicate summary-worthy content in microblog conversations.
In this paper we present GumDrop, Georgetown University's entry at the DISRPT 2019 Shared Task on automatic discourse unit segmentation and connective detection. Our approach relies on model stacking, creating a heterogeneous ensemble of classifiers, which feed into a metalearner for each final task. The system encompasses three trainable component stacks: one for sentence splitting, one for discourse unit segmentation and one for connective detection. The flexibility of each ensemble allows the system to generalize well to datasets of different sizes and with varying levels of homogeneity.
We introduce an attention-based Bi-LSTM for Chinese implicit discourse relations and demonstrate that modeling argument pairs as a joint sequence can outperform word order-agnostic approaches. Our model benefits from a partial sampling scheme and is conceptually simple, yet achieves state-of-the-art performance on the Chinese Discourse Treebank. We also visualize its attention activity to illustrate the model's ability to selectively focus on the relevant parts of an input sequence.
We present a novel corpus of 445 human- and computer-generated documents, comprising about 27,000 clauses, annotated for semantic clause types and coherence relations that allow for nuanced comparison of artificial and natural discourse modes. The corpus covers both formal and informal discourse, and contains documents generated using fine-tuned GPT-2 (Zellers et al., 2019) and GPT-3(Brown et al., 2020). We showcase the usefulness of this corpus for detailed discourse analysis of text generation by providing preliminary evidence that less numerous, shorter and more often incoherent clause relations are associated with lower perceived quality of computer-generated narratives and arguments.
The capabilities and use cases of automatic natural language processing (NLP) have grown significantly over the last few years. While much work has been devoted to understanding how humans deal with discourse connectives, this phenomenon is understudied in computational systems. Therefore, it is important to put NLP models under the microscope and examine whether they can adequately comprehend, process, and reason within the complexity of natural language. In this chapter, we introduce the main mechanisms behind automatic sentence processing systems step by step and then focus on evaluating discourse connective processing. We assess nine popular systems in their ability to understand English discourse connectives and analyze how context and language understanding tasks affect their connective comprehension. The results show that NLP systems do not process all discourse connectives equally well and that the computational processing complexity of different connective kinds is not always consistently in line with the presumed complexity order found in human processing. In addition, while humans are more inclined to be influenced during the reading procedure but not necessarily in the final comprehension performance, discourse connectives have a significant impact on the final accuracy of NLP systems. The richer knowledge of connectives a system learns, the more negative effect inappropriate connectives have on it. This suggests that the correct explicitation of discourse connectives is important for computational natural language processing.
We present novel automatic metrics for machine translation evaluation that use discourse structure and convolution kernels to compare the discourse tree of an automatic translation with that of the human reference. We experiment with five transformations and augmentations of a base discourse tree representation based on the rhetorical structure theory, and we combine the kernel scores for each of them into a single score. Finally, we add other metrics from the ASIYA MT evaluation toolkit, and we tune the weights of the combination on actual human judgments. Experiments on the WMT12 and WMT13 metrics shared task datasets show correlation with human judgments that outperforms what the best systems that participated in these years achieved, both at the segment and at the system level.
Recent neural supervised topic segmentation models achieve distinguished superior effectiveness over unsupervised methods, with the availability of large-scale training corpora sampled from Wikipedia. These models may, however, suffer from limited robustness and transferability caused by exploiting simple linguistic cues for prediction, but overlooking more important inter-sentential topical consistency. To address this issue, we present a discourse-aware neural topic segmentation model with the injection of above-sentence discourse dependency structures to encourage the model make topic boundary prediction based more on the topical consistency between sentences. Our empirical study on English evaluation datasets shows that injecting above-sentence discourse structures to a neural topic segmenter with our proposed strategy can substantially improve its performances on intra-domain and out-of-domain data, with little increase of model's complexity.
With the increasing use of generative Artificial Intelligence (AI) methods to support science workflows, we are interested in the use of discourse-level information to find supporting evidence for AI generated scientific claims. A first step towards this objective is to examine the task of inferring discourse structure in scientific writing. In this work, we present a preliminary investigation of pretrained language model (PLM) and Large Language Model (LLM) approaches for Discourse Relation Classification (DRC), focusing on scientific publications, an under-studied genre for this task. We examine how context can help with the DRC task, with our experiments showing that context, as defined by discourse structure, is generally helpful. We also present an analysis of which scientific discourse relation types might benefit most from context.
Policy makers and managers sometimes assess the share of research produced by a group (country, department, institution). This takes the form of the percentage of publications in a journal, field or broad area that has been published by the group. This quantity is affected by essentially random influences that obscure underlying changes over time and differences between groups. A model of research production is needed to help identify whether differences between two shares indicate underlying differences. This article introduces a simple production model for indicators that report the share of the world's output in a journal or subject category, assuming that every new article has the same probability to be authored by a given group. With this assumption, confidence limits can be calculated for the underlying production capability (i.e., probability to publish). The results of a time series analysis of national contributions to 36 large monodisciplinary journals 1996-2016 are broadly consistent with this hypothesis. Follow up tests of countries and institutions in 26 Scopus subject categories support the conclusions but highlight the importance of ensuring consistent subject category coverage.
Time pressure and topic negotiation may impose constraints on how people leverage discourse relations (DRs) in spontaneous conversational contexts. In this work, we adapt a system of DRs for written language to spontaneous dialogue using crowdsourced annotations from novice annotators. We then test whether discourse relations are used differently across several types of multi-utterance contexts. We compare the patterns of DR annotation within and across speakers and within and across turns. Ultimately, we find that different discourse contexts produce distinct distributions of discourse relations, with single-turn annotations creating the most uncertainty for annotators. Additionally, we find that the discourse relation annotations are of sufficient quality to predict from embeddings of discourse units.
Previous data-driven work investigating the types and distributions of discourse relation signals, including discourse markers such as 'however' or phrases such as 'as a result' has focused on the relative frequencies of signal words within and outside text from each discourse relation. Such approaches do not allow us to quantify the signaling strength of individual instances of a signal on a scale (e.g. more or less discourse-relevant instances of 'and'), to assess the distribution of ambiguity for signals, or to identify words that hinder discourse relation identification in context ('anti-signals' or 'distractors'). In this paper we present a data-driven approach to signal detection using a distantly supervised neural network and develop a metric, Delta s (or 'delta-softmax'), to quantify signaling strength. Ranging between -1 and 1 and relying on recent advances in contextualized words embeddings, the metric represents each word's positive or negative contribution to the identifiability of a relation in specific instances in context. Based on an English corpus annotated for discourse relations using Rhetorical Structure Theory and signal type annotations anchored to specific tokens, our analysis examines the reliability of the metric, the places where it overlaps with and differs from human judgments, and the implications for identifying features that neural models may need in order to perform better on automatic discourse relation classification.
The identification of political actors who put forward claims in public debate is a crucial step in the construction of discourse networks, which are helpful to analyze societal debates. Actor identification is, however, rather challenging: Often, the locally mentioned speaker of a claim is only a pronoun ("He proposed that [claim]"), so recovering the canonical actor name requires discourse understanding. We compare a traditional pipeline of dedicated NLP components (similar to those applied to the related task of coreference) with a LLM, which appears a good match for this generation task. Evaluating on a corpus of German actors in newspaper reports, we find surprisingly that the LLM performs worse. Further analysis reveals that the LLM is very good at identifying the right reference, but struggles to generate the correct canonical form. This points to an underlying issue in LLMs with controlling generated output. Indeed, a hybrid model combining the LLM with a classifier to normalize its output substantially outperforms both initial models.
We study three-dimensional Alexandrov spaces with a lower curvature bound, focusing on extending three classical results on three-dimensional manifolds: First, we show that a closed three-dimensional Alexandrov space of positive curvature, with at least one topological singularity, must be homeomorphic to the suspension of the real projective plane; we use this to classify, up to homeomorphism, closed, positively curved Alexandrov spaces of dimension three. Second, we classify closed three-dimensional Alexandrov spaces of nonnegative curvature. Third, we study the well-known Poincaré Conjecture in dimension three, in the context of Alexandrov spaces, in the two forms it is usually formulated for manifolds. We first show that the only three-dimensional Alexandrov space that is also a homotopy sphere is the 3-sphere; then we give examples of closed, geometric, simply connected three-dimensional Alexandrov spaces for five of the eight Thurston geometries, proving along the way the impossibility of getting such examples for the Nil, $\widetilde{\mathrm{SL}_2(\mathbb{R})}$ and Sol geometries. We conclude the paper by proving the analogue of the geometrization conjecture for closed three-dimensional Alexandrov spaces.
We here explore a ``fully'' lexicalized Tree-Adjoining Grammar for discourse that takes the basic elements of a (monologic) discourse to be not simply clauses, but larger structures that are anchored on variously realized discourse cues. This link with intra-sentential grammar suggests an account for different patterns of discourse cues, while the different structures and operations suggest three separate sources for elements of discourse meaning: (1) a compositional semantics tied to the basic trees and operations; (2) a presuppositional semantics carried by cue phrases that freely adjoin to trees; and (3) general inference, that draws additional, defeasible conclusions that flesh out what is conveyed compositionally.
ChatGPT Evaluation on Sentence Level Relations: A Focus on Temporal, Causal, and Discourse Relations
This paper aims to quantitatively evaluate the performance of ChatGPT, an interactive large language model, on inter-sentential relations such as temporal relations, causal relations, and discourse relations. Given ChatGPT's promising performance across various tasks, we proceed to carry out thorough evaluations on the whole test sets of 11 datasets, including temporal and causal relations, PDTB2.0-based, and dialogue-based discourse relations. To ensure the reliability of our findings, we employ three tailored prompt templates for each task, including the zero-shot prompt template, zero-shot prompt engineering (PE) template, and in-context learning (ICL) prompt template, to establish the initial baseline scores for all popular sentence-pair relation classification tasks for the first time. Through our study, we discover that ChatGPT exhibits exceptional proficiency in detecting and reasoning about causal relations, albeit it may not possess the same level of expertise in identifying the temporal order between two events. While it is capable of identifying the majority of discourse relations with existing explicit discourse connectives, the implicit discourse relation remains a formidable challenge. Concurrently, ChatGPT demonstrates subpar performance in the dialogue discourse parsing task that requires structural understanding in a dialogue before being aware of the discourse relation.
Social media has reshaped political discourse, offering politicians a platform for direct engagement while reinforcing polarization and ideological divides. This study introduces a novel topic evolution framework that integrates BERTopic-based topic modeling with Moral Foundations Theory (MFT) to analyze the longevity and moral dimensions of political topics in Twitter activity during the 117th U.S. Congress. We propose a methodology for tracking dynamic topic shifts over time and measuring their association with moral values and quantifying topic persistence. Our findings reveal that while overarching themes remain stable, granular topics tend to dissolve rapidly, limiting their long-term influence. Moreover, moral foundations play a critical role in topic longevity, with Care and Loyalty dominating durable topics, while partisan differences manifest in distinct moral framing strategies. This work contributes to the field of social network analysis and computational political discourse by offering a scalable, interpretable approach to understanding moral-driven topic evolution on social media.
Our paper investigates the use of discourse embedding techniques to develop a community recommendation system that focuses on mental health support groups on social media. Social media platforms provide a means for users to anonymously connect with communities that cater to their specific interests. However, with the vast number of online communities available, users may face difficulties in identifying relevant groups to address their mental health concerns. To address this challenge, we explore the integration of discourse information from various subreddit communities using embedding techniques to develop an effective recommendation system. Our approach involves the use of content-based and collaborative filtering techniques to enhance the performance of the recommendation system. Our findings indicate that the proposed approach outperforms the use of each technique separately and provides interpretability in the recommendation process.
In this paper, we present a method for identifying discourse marker usage in spontaneous speech based on machine learning. Discourse markers are denoted by special POS tags, and thus the process of POS tagging can be used to identify discourse markers. By incorporating POS tagging into language modeling, discourse markers can be identified during speech recognition, in which the timeliness of the information can be used to help predict the following words. We contrast this approach with an alternative machine learning approach proposed by Litman (1996). This paper also argues that discourse markers can be used to help the hearer predict the role that the upcoming utterance plays in the dialog. Thus discourse markers should provide valuable evidence for automatic dialog act prediction.
Optimisation problems are ubiquitous in particle and astrophysics, and involve locating the optimum of a complicated function of many parameters that may be computationally expensive to evaluate. We describe a number of global optimisation algorithms that are not yet widely used in particle astrophysics, benchmark them against random sampling and existing techniques, and perform a detailed comparison of their performance on a range of test functions. These include four analytic test functions of varying dimensionality, and a realistic example derived from a recent global fit of weak-scale supersymmetry. Although the best algorithm to use depends on the function being investigated, we are able to present general conclusions about the relative merits of random sampling, Differential Evolution, Particle Swarm Optimisation, the Covariance Matrix Adaptation Evolution Strategy, Bayesian Optimisation, Grey Wolf Optimisation, and the PyGMO Artificial Bee Colony, Gaussian Particle Filter and Adaptive Memory Programming for Global Optimisation algorithms.
In the last years, chiral effective field theory has been successfully developed for and applied to systems with few nucleons. Here, I present a new approach for ab initio calculations of nuclei that combines these precise and systematic forces with Monte Carlo simulation techniques that allow for exact solutions of the nuclear A-body problem. A short introduction of this method is given and a few assorted results concerning the spectrum and structure of 12C and 16O are presented. The framework further allows one to study the properties of nuclei in worlds that have fundamental parameters different from the ones in Nature. This allows for a physics test of the anthropic principle by addressing the question how strongly the generation of the life-relevant elements depends on the light quark masses and the electromagnetic fine structure constant.
We classify three dimensional evolution algebras over a field having characteristic different from 2 and in which there are roots of orders 2, 3 and 7.
Three-dimension will be a characteristic of future user interfaces, although we are just starting to gain an understanding of how users can navigate and share information within a virtual 3D environment. Three-dimensional graphical user interfaces (3D-GUI) raise many issues of design, metaphor and usability. This research is devoted to designing a 3D-GUI as a front-end tool for a file management system, in this case, for Microsoft Windows\c{opyright} Explorer; as well as evaluating the efficiency of a 3D application. The software design was implemented by extending the Half-Life 3D engine. This extension provides a directory traversal and basic file management functions, like cut, copy, paste, delete, and so on. This paper shows the design and implementation of a real-world application that contains an efficient 3D-GUI.
Cooperative Driving Automation (CDA) has garnered increasing research attention, yet the role of intelligent infrastructure remains insufficiently explored. Existing solutions offer limited support for addressing long-tail challenges, real-synthetic data fusion, and heterogeneous sensor management. This paper introduces CDA-SimBoost, a unified framework that constructs infrastructure-centric simulation environments from real-world data. CDA-SimBoost consists of three main components: a Digital Twin Builder for generating high-fidelity simulator assets based on sensor and HD map data, OFDataPip for processing both online and offline data streams, and OpenCDA-InfraX, a high-fidelity platform for infrastructure-focused simulation. The system supports realistic scenario construction, rare event synthesis, and scalable evaluation for CDA research. With its modular architecture and standardized benchmarking capabilities, CDA-SimBoost bridges real-world dynamics and virtual environments, facilitating reproducible and extensible infrastructure-driven CDA studies. All resources are publicly available at https://github.com/zhz03/CDA-SimBoost
The increasing popularity of social media promotes the proliferation of fake news, which has caused significant negative societal effects. Therefore, fake news detection on social media has recently become an emerging research area of great concern. With the development of multimedia technology, fake news attempts to utilize multimedia content with images or videos to attract and mislead consumers for rapid dissemination, which makes visual content an important part of fake news. Despite the importance of visual content, our understanding of the role of visual content in fake news detection is still limited. This chapter presents a comprehensive review of the visual content in fake news, including the basic concepts, effective visual features, representative detection methods and challenging issues of multimedia fake news detection. This chapter can help readers to understand the role of visual content in fake news detection, and effectively utilize visual content to assist in detecting multimedia fake news.
We introduce a Content-based Document Alignment approach (CDA), an efficient method to align multilingual web documents based on content in creating parallel training data for machine translation (MT) systems operating at the industrial level. CDA works in two steps: (i) projecting documents of a web domain to a shared multilingual space; then (ii) aligning them based on the similarity of their representations in such space. We leverage lexical translation models to build vector representations using TF-IDF. CDA achieves performance comparable with state-of-the-art systems in the WMT-16 Bilingual Document Alignment Shared Task benchmark while operating in multilingual space. Besides, we created two web-scale datasets to examine the robustness of CDA in an industrial setting involving up to 28 languages and millions of documents. The experiments show that CDA is robust, cost-effective, and is significantly superior in (i) processing large and noisy web data and (ii) scaling to new and low-resourced languages.
Unstructured grid ocean models are advantageous for simulating the coastal ocean and river-estuary-plume systems. However, unstructured grid models tend to be diffusive and/or computationally expensive which limits their applicability to real life problems. In this paper, we describe a novel discontinuous Galerkin (DG) finite element discretization for the hydrostatic equations. The formulation is fully conservative and second-order accurate in space and time. Monotonicity of the advection scheme is ensured by using a strong stability preserving time integration method and slope limiters. Compared to previous DG models advantages include a more accurate mode splitting method, revised viscosity formulation, and new second-order time integration scheme. We demonstrate that the model is capable of simulating baroclinic flows in the eddying regime with a suite of test cases. Numerical dissipation is well-controlled, being comparable or lower than in existing state-of-the-art structured grid models.
A simple axiomatic characterization of the general (infinite dimensional, noncommutative) Ito algebra is given and a pseudo-Euclidean fundamental representation for such algebra is described. The notion of Ito B*-algebra, generalizing the C*-algebra is defined to include the Banach infinite dimensional Ito algebras of quantum Brownian and quantum Levy motion, and the B*-algebras of vacuum and thermal quantum noise are characterized. It is proved that every Ito algebra is canonically decomposed into the orthogonal sum of quantum Brownian (Wiener) algebra and quantum Levy (Poisson) algebra. In particular, every quantum thermal noise is the orthogonal sum of a quantum Wiener noise and a quantum Poisson noise as it is stated by the Levy-Khinchin theorem in the classical case.
This paper presents a two-phase protein folding optimization on a three-dimensional AB off-lattice model. The first phase is responsible for forming conformations with a good hydrophobic core or a set of compact hydrophobic amino acid positions. These conformations are forwarded to the second phase, where an accurate search is performed with the aim of locating conformations with the best energy value. The optimization process switches between these two phases until the stopping condition is satisfied. An auxiliary fitness function was designed for the first phase, while the original fitness function is used in the second phase. The auxiliary fitness function includes an expression about the quality of the hydrophobic core. This expression is crucial for leading the search process to the promising solutions that have a good hydrophobic core and, consequently, improves the efficiency of the whole optimization process. Our differential evolution algorithm was used for demonstrating the efficiency of two-phase optimization. It was analyzed on well-known amino acid sequences that are used frequently in the literature. The obtained experimental results show that the employed two-phase optimization improves the efficiency of our algorithm significantly and that the proposed algorithm is superior to other state-of-the-art algorithms.
Visual working memory (VWM) allows us to actively store, update and manipulate visual information surrounding us. While the underlying neural mechanisms of VWM remain unclear, contralateral delay activity (CDA), a sustained negativity over the hemisphere contralateral to the positions of visual items to be remembered, is often used to study VWM. To investigate if the CDA is a robust neural correlate for VWM tasks, we reproduced eight CDA-related studies with a publicly accessible EEG dataset. We used the raw EEG data from these eight studies and analyzed all of them with the same basic pipeline to extract CDA. We were able to reproduce the results from all the studies and show that with a basic automated EEG pipeline we can extract a clear CDA signal. We share insights from the trends observed across the studies and raise some questions about the CDA decay and the CDA during the recall phase, which surprisingly, none of the eight studies did address. Finally, we also provide reproducibility recommendations based on our experience and challenges in reproducing these studies.
News recommendation is one of the most challenging tasks in recommender systems, mainly due to the ephemeral relevance of news to users. As social media, and particularly microblogging applications like Twitter or Weibo, gains popularity as platforms for news dissemination, personalized news recommendation in this context becomes a significant challenge. We revisit news recommendation in the microblogging scenario, by taking into consideration social interactions and observations tracing how the information that is up for recommendation spreads in an underlying network. We propose a deep-learning based approach that is diffusion and influence-aware, called Influence-Graph News Recommender (IGNiteR). It is a content-based deep recommendation model that jointly exploits all the data facets that may impact adoption decisions, namely semantics, diffusion-related features pertaining to local and global influence among users, temporal attractiveness, and timeliness, as well as dynamic user preferences. To represent the news, a multi-level attention-based encoder is used to reveal the different interests of users. This news encoder relies on a CNN for the news content and on an attentive LSTM for the diffusion traces. For the latter, by exploiting previously observed news diffusions (cascades) in the microblogging medium, users are mapped to a latent space that captures potential influence on others or susceptibility of being influenced for news adoptions. Similarly, a time-sensitive user encoder enables us to capture the dynamic preferences of users with an attention-based bidirectional LSTM. We perform extensive experiments on two real-world datasets, showing that IGNiteR outperforms the state-of-the-art deep-learning based news recommendation methods.
This document defines the high level metadata necessary to describe the physical parameter space of observed or simulated astronomical data sets, such as 2D-images, data cubes, X-ray event lists, IFU data, etc.. The Characterisation data model is an abstraction which can be used to derive a structured description of any relevant data and thus to facilitate its discovery and scientific interpretation. The model aims at facilitating the manipulation of heterogeneous data in any VO framework or portal. A VO Characterisation instance can include descriptions of the data axes, the range of coordinates covered by the data, and details of the data sampling and resolution on each axis. These descriptions should be in terms of physical variables, independent of instrumental signatures as far as possible. Implementations of this model has been described in the IVOA Note available at: http://www.ivoa.net/Documents/latest/ImplementationCharacterisation.html Utypes derived from this version of the UML model are listed and commented in the following IVOA Note: http://www.ivoa.net/Documents/latest/UtypeListCharacterisationDM.html An XML schema has been build up from the UML model and is available at: http://www.ivoa.net/xml/Characterisation/Characterisation-v1.11.xsd
Attribution modelling lies at the heart of marketing effectiveness, yet most existing approaches depend on user-level path data, which are increasingly inaccessible due to privacy regulations and platform restrictions. This paper introduces a Causal-Driven Attribution (CDA) framework that infers channel influence using only aggregated impression-level data, avoiding any reliance on user identifiers or click-path tracking. CDA integrates temporal causal discovery (using PCMCI) with causal effect estimation via a Structural Causal Model to recover directional channel relationships and quantify their contributions to conversions. Using large-scale synthetic data designed to replicate real marketing dynamics, we show that CDA achieves an average relative RMSE of 9.50% when given the true causal graph, and 24.23% when using the predicted graph, demonstrating strong accuracy under correct structure and meaningful signal recovery even under structural uncertainty. CDA captures cross-channel interdependencies while providing interpretable, privacy-preserving attribution insights, offering a scalable and future-proof alternative to traditional path-based models.
The Romansh language has several regional varieties, called idioms, which sometimes have limited mutual intelligibility. Despite this linguistic diversity, there has been a lack of documented efforts to build a language identification (LID) system that can distinguish between these idioms. Since Romansh LID should also be able to recognize Rumantsch Grischun, a supra-regional variety that combines elements of several idioms, this makes for a novel and interesting classification problem. In this paper, we present a LID system for Romansh idioms based on an SVM approach. We evaluate our model on a newly curated benchmark across two domains and find that it reaches an average in-domain accuracy of 97%, enabling applications such as idiom-aware spell checking or machine translation. Our classifier is publicly available.
We propose, analyze, and test an efficient splitting iteration for solving the incompressible, steady Navier-Stokes equations in the setting where partial solution data is known. The (possibly noisy) solution data is incorporated into a Picard-type solver via continuous data assimilation (CDA). Efficiency is gained over the usual Picard iteration through an algebraic splitting of Yosida-type that produces easier linear solves, and accuracy/consistency is shown to be maintained through the use of an incremental pressure and grad-div stabilization. We prove that CDA scales the Lipschitz constant of the associated fixed point operator by $H^{1/2}$, where $H$ is the characteristic spacing of the known solution data. This implies that CDA accelerates an already converging solver (and the more data, the more acceleration) and enables convergence of solvers in parameter regimes where the solver would fail (and the more data, the larger the parameter regime). Numerical tests illustrate the theory on several benchmark test problems and show that the proposed efficient solver gives nearly identical results in terms of number of iterations to converge; in other words, the proposed solver gives an efficiency gain with no loss in convergence rate.
We study a three-dimensional abelian lattice model in which the analogue of a theta term can be defined. This term is defined by introducing a neutral scalar field and its effect is to couple magnetic monopoles to the scalar field and vortices to the gauge field. An interesting feature of this model is the presence of an exact duality symmetry that acts on a three parameter space. It is shown that this model has an interesting phase structure for non-zero values of theta. In addition to the usual confinement and vortex phases there are phases in which loops with composite charges condense. The presence of novel point like excitations also alters the physical properties of the system.
DatalogMTL is an extension of Datalog with metric temporal operators that has found an increasing number of applications in recent years. Reasoning in DatalogMTL is, however, of high computational complexity, which makes reasoning in modern data-intensive applications challenging. In this paper we present a practical reasoning algorithm for the full DatalogMTL language, which we have implemented in a system called MeTeoR. Our approach effectively combines an optimised (but generally non-terminating) materialisation (a.k.a. forward chaining) procedure, which provides scalable behaviour, with an automata-based component that guarantees termination and completeness. To ensure favourable scalability of the materialisation component, we propose a novel seminaïve materialisation procedure for DatalogMTL enjoying the non-repetition property, which ensures that each specific rule application will be considered at most once throughout the entire execution of the algorithm. Moreover, our materialisation procedure is enhanced with additional optimisations which further reduce the number of redundant computations performed during materialisation by disregarding rules as soon as it is certain that they cannot derive new facts in subsequent materialisation steps. Our extensive evaluation supports the practicality of our approach.
All online sharing systems gather data that reflects users' collective behaviour and their shared activities. This data can be used to extract different kinds of relationships, which can be grouped into layers, and which are basic components of the multidimensional social network proposed in the paper. The layers are created on the basis of two types of relations between humans, i.e. direct and object-based ones which respectively correspond to either social or semantic links between individuals. For better understanding of the complexity of the social network structure, layers and their profiles were identified and studied on two, spanned in time, snapshots of the Flickr population. Additionally, for each layer, a separate strength measure was proposed. The experiments on the Flickr photo sharing system revealed that the relationships between users result either from semantic links between objects they operate on or from social connections of these users. Moreover, the density of the social network increases in time. The second part of the study is devoted to building a social recommender system that supports the creation of new relations between users in a multimedia sharing system. Its main goal is to generate personalized suggestions that are continuously adapted to users' needs depending on the personal weights assigned to each layer in the multidimensional social network. The conducted experiments confirmed the usefulness of the proposed model.
Social media plays a central role in shaping public opinion and behavior, yet performing experiments on these platforms and, in particular, on feed algorithms is becoming increasingly challenging. This guide offers practical recommendations for researchers developing and deploying field experiments focused on real-time reranking of social media feeds. The article is organized around two contributions. First, we provide an overview of an experimental method using web browser extensions that intercepts and reranks content in real time, enabling naturalistic reranking field experiments. We then describe feed interventions and measurements that this paradigm enables on participants' actual feeds, without requiring the involvement of social media platforms. Second, we offer concrete technical recommendations for intercepting and reranking social media feeds with minimal user-facing delay, and provide an open-source implementation. This document aims to summarize lessons learned in running field experiments on social media, provide concrete implementation details, and foster the ecosystem of independent social media research. Finally, we release the source code that serves as a blueprint for implementing future feed-ranking experiments.
The plague of false information, popularly called fake news has affected lives of news consumers ever since the prevalence of social media. Thus understanding the spread of false information in social networks has gained a lot of attention in the literature. While most proposed models do content analysis of the information, no much work has been done by exploring the community structures that also play an important role in determining how people get exposed to it. In this paper we base our idea on Computational Trust in social networks to propose a novel Community Health Assessment model against fake news. Based on the concepts of neighbor, boundary and core nodes of a community, we propose novel evaluation metrics to quantify the vulnerability of nodes (individual-level) and communities (group-level) to spreading false information. Our model hypothesizes that if the boundary nodes trust the neighbor nodes of a community who are spreaders, the densely-connected core nodes of the community are highly likely to become spreaders. We test our model with communities generated using three popular community detection algorithms based on two new datasets of information spreading networks collected from Twitter. Our experimental results show that the proposed metrics perform clearly better on the networks spreading false information than on those spreading true ones, indicating our community health assessment model is effective.
Given an integer array A, the prefix-sum problem is to answer sum(i) queries that return the sum of the elements in A[0..i], knowing that the integers in A can be changed. It is a classic problem in data structure design with a wide range of applications in computing from coding to databases. In this work, we propose and compare several and practical solutions to this problem, showing that new trade-offs between the performance of queries and updates can be achieved on modern hardware.
Bots have been playing a crucial role in online platform ecosystems, as efficient and automatic tools to generate content and diffuse information to the social media human population. In this chapter, we will discuss the role of social bots in content spreading dynamics in social media. In particular, we will first investigate some differences between diffusion dynamics of content generated by bots, as opposed to humans, in the context of political communication, then study the characteristics of bots behind the diffusion dynamics of social media spam campaigns.
In 2016, a network of social media accounts animated by Russian operatives attempted to divert political discourse within the American public around the presidential elections. This was a coordinated effort, part of a Russian-led complex information operation. Utilizing the anonymity and outreach of social media platforms Russian operatives created an online astroturf that is in direct contact with regular Americans, promoting Russian agenda and goals. The elusiveness of this type of adversarial approach rendered security agencies helpless, stressing the unique challenges this type of intervention presents. Building on existing scholarship on the functions within influence networks on social media, we suggest a new approach to map those types of operations. We argue that pretending to be legitimate social actors obliges the network to adhere to social expectations, leaving a social footprint. To test the robustness of this social footprint we train artificial intelligence to identify it and create a predictive model. We use Twitter data identified as part of the Russian influence network for training the artificial intelligence and to test the prediction. Our model attains 88% prediction accuracy for the test set. Testing our prediction on two additional models results in 90.7% and 90.5% accuracy, validating our model. The predictive and validation results suggest that building a machine learning model around social functions within the Russian influence network can be used to map its actors and functions.
Although beneficial information abounds on social media, the dissemination of harmful information such as so-called ``fake news'' has become a serious issue. Therefore, many researchers have devoted considerable effort to limiting the diffusion of harmful information. A promising approach to limiting diffusion of such information is link deletion methods in social networks. Link deletion methods have been shown to be effective in reducing the size of information diffusion cascades generated by synthetic models on a given social network. In this study, we evaluate the effectiveness of link deletion methods by using actual logs of retweet cascades, rather than by using synthetic diffusion models. Our results show that even after deleting 10\%--50\% of links from a social network, the size of cascades after link deletion is estimated to be only 50\% the original size under the optimistic estimation, which suggests that the effectiveness of the link deletion strategy for suppressing information diffusion is limited. Moreover, our results also show that there is a considerable number of cascades with many seed users, which renders link deletion methods inefficient.
Politicization is a social phenomenon studied by political science characterized by the extent to which ideas and facts are given a political tone. A range of topics, such as climate change, religion and vaccines has been subject to increasing politicization in the media and social media platforms. In this work, we propose a computational method for assessing politicization in online conversations based on topic shifts, i.e., the degree to which people switch topics in online conversations. The intuition is that topic shifts from a non-political topic to politics are a direct measure of politicization -- making something political, and that the more people switch conversations to politics, the more they perceive politics as playing a vital role in their daily lives. A fundamental challenge that must be addressed when one studies politicization in social media is that, a priori, any topic may be politicized. Hence, any keyword-based method or even machine learning approaches that rely on topic labels to classify topics are expensive to run and potentially ineffective. Instead, we learn from a seed of political keywords and use Positive-Unlabeled (PU) Learning to detect political comments in reaction to non-political news articles posted on Twitter, YouTube, and TikTok during the 2022 Brazilian presidential elections. Our findings indicate that all platforms show evidence of politicization as discussion around topics adjacent to politics such as economy, crime and drugs tend to shift to politics. Even the least politicized topics had the rate in which their topics shift to politics increased in the lead up to the elections and after other political events in Brazil -- an evidence of politicization.
In the process of rewriting large physics codes at Los Alamos National Laboratory to perform well on new architectures such as many-core, GPU, and Intel MIC, we have found a number of areas in which sustainable software practices can provide significant advantages. We describe several specific advantages of sustainable practices for future architectures, and report on two small experimental projects at LANL intended to raise awareness of new software practices and programming approaches for new architectures.
A better understanding of the behavior of tourists is strategic for improving services in the competitive and important economic segment of global tourism. Critical studies in the literature often explore the issue using traditional data, such as questionnaires or interviews. Traditional approaches provide precious information; however, they impose challenges to obtaining large-scale data, making it hard to study worldwide patterns. Location-based social networks (LBSNs) can potentially mitigate such issues due to the relatively low cost of acquiring large amounts of behavioral data. Nevertheless, before using such data for studying tourists' behavior, it is necessary to verify whether the information adequately reveals the behavior measured with traditional data -- considered the ground truth. Thus, the present work investigates in which countries the global tourism network measured with an LBSN agreeably reflects the behavior estimated by the World Tourism Organization using traditional methods. Although we could find exceptions, the results suggest that, for most countries, LBSN data can satisfactorily represent the behavior studied. We have an indication that, in countries with high correlations between results obtained from both datasets, LBSN data can be used in research regarding the mobility of the tourists in the studied context.
The increasing pervasiveness of social media creates new opportunities to study human social behavior, while challenging our capability to analyze their massive data streams. One of the emerging tasks is to distinguish between different kinds of activities, for example engineered misinformation campaigns versus spontaneous communication. Such detection problems require a formal definition of meme, or unit of information that can spread from person to person through the social network. Once a meme is identified, supervised learning methods can be applied to classify different types of communication. The appropriate granularity of a meme, however, is hardly captured from existing entities such as tags and keywords. Here we present a framework for the novel task of detecting memes by clustering messages from large streams of social data. We evaluate various similarity measures that leverage content, metadata, network features, and their combinations. We also explore the idea of pre-clustering on the basis of existing entities. A systematic evaluation is carried out using a manually curated dataset as ground truth. Our analysis shows that pre-clustering and a combination of heterogeneous features yield the best trade-off between number of clusters and their quality, demonstrating that a simple combination based on pairwise maximization of similarity is as effective as a non-trivial optimization of parameters. Our approach is fully automatic, unsupervised, and scalable for real-time detection of memes in streaming data.
The social recommender system that supports the creation of new relations between users in the multimedia sharing system is presented in the paper. To generate suggestions the new concept of the multirelational social network was introduced. It covers both direct as well as object-based relationships that reflect social and semantic links between users. The main goal of the new method is to create the personalized suggestions that are continuously adapted to users' needs depending on the personal weights assigned to each layer from the social network. The conducted experiments confirmed the usefulness of the proposed model.
Stress and depression are prevalent nowadays across people of all ages due to the quick paces of life. People use social media to express their feelings. Thus, social media constitute a valuable form of information for the early detection of stress and depression. Although many research works have been introduced targeting the early recognition of stress and depression, there are still limitations. There have been proposed multi-task learning settings, which use depression and emotion (or figurative language) as the primary and auxiliary tasks respectively. However, although stress is inextricably linked with depression, researchers face these two tasks as two separate tasks. To address these limitations, we present the first study, which exploits two different datasets collected under different conditions, and introduce two multitask learning frameworks, which use depression and stress as the main and auxiliary tasks respectively. Specifically, we use a depression dataset and a stressful dataset including stressful posts from ten subreddits of five domains. In terms of the first approach, each post passes through a shared BERT layer, which is updated by both tasks. Next, two separate BERT encoder layers are exploited, which are updated by each task separately. Regarding the second approach, it consists of shared and task-specific layers weighted by attention fusion networks. We conduct a series of experiments and compare our approaches with existing research initiatives, single-task learning, and transfer learning. Experiments show multiple advantages of our approaches over state-of-the-art ones.
The dynamic character of most social networks requires to model evolution of networks in order to enable complex analysis of theirs dynamics. The following paper focuses on the definition of differences between network snapshots by means of Graph Differential Tuple. These differences enable to calculate the diverse distance measures as well as to investigate the speed of changes. Four separate measures are suggested in the paper with experimental study on real social network data.
A new method of feature extraction in the social network for within-network classification is proposed in the paper. The method provides new features calculated by combination of both: network structure information and class labels assigned to nodes. The influence of various features on classification performance has also been studied. The experiments on real-world data have shown that features created owing to the proposed method can lead to significant improvement of classification accuracy.
Coordinated online behaviors are an essential part of information and influence operations, as they allow a more effective disinformation's spread. Most studies on coordinated behaviors involved manual investigations, and the few existing computational approaches make bold assumptions or oversimplify the problem to make it tractable. Here, we propose a new network-based framework for uncovering and studying coordinated behaviors on social media. Our research extends existing systems and goes beyond limiting binary classifications of coordinated and uncoordinated behaviors. It allows to expose different coordination patterns and to estimate the degree of coordination that characterizes diverse communities. We apply our framework to a dataset collected during the 2019 UK General Election, detecting and characterizing coordinated communities that participated in the electoral debate. Our work conveys both theoretical and practical implications and provides more nuanced and fine-grained results for studying online information manipulation.
In this paper, we address the challenge of discovering hidden nodes in unknown social networks, formulating three types of hidden-node discovery problems, namely, Sybil-node discovery, peripheral-node discovery, and influencer discovery. We tackle these problems by employing a graph exploration framework grounded in machine learning. Leveraging the structure of the subgraph gradually obtained from graph exploration, we construct prediction models to identify target hidden nodes in unknown social graphs. Through empirical investigations of real social graphs, we investigate the efficiency of graph exploration strategies in uncovering hidden nodes. Our results show that our graph exploration strategies discover hidden nodes with an efficiency comparable to that when the graph structure is known. Specifically, the query cost of discovering 10% of the hidden nodes is at most only 1.2 times that when the topology is known, and the query-cost multiplier for discovering 90% of the hidden nodes is at most only 1.4. Furthermore, our results suggest that using node embeddings, which are low-dimensional vector representations of nodes, for hidden-node discovery is a double-edged sword: it is effective in certain scenarios but sometimes degrades the efficiency of node discovery. Guided by this observation, we examine the effectiveness of using a bandit algorithm to combine the prediction models that use node embeddings with those that do not, and our analysis shows that the bandit-based graph exploration strategy achieves efficient node discovery across a wide array of settings.
Environmental Social Governance (ESG) is a widely used metric that measures the sustainability of a company practices. Currently, ESG is determined using self-reported corporate filings, which allows companies to portray themselves in an artificially positive light. As a result, ESG evaluation is subjective and inconsistent across raters, giving executives mixed signals on what to improve. This project aims to create a data-driven ESG evaluation system that can provide better guidance and more systemized scores by incorporating social sentiment. Social sentiment allows for more balanced perspectives which directly highlight public opinion, helping companies create more focused and impactful initiatives. To build this, Python web scrapers were developed to collect data from Wikipedia, Twitter, LinkedIn, and Google News for the S&P 500 companies. Data was then cleaned and passed through NLP algorithms to obtain sentiment scores for ESG subcategories. Using these features, machine-learning algorithms were trained and calibrated to S&P Global ESG Ratings to test their predictive capabilities. The Random-Forest model was the strongest model with a mean absolute error of 13.4% and a correlation of 26.1% (p-value 0.0372), showing encouraging results. Overall, measuring ESG social sentiment across sub-categories can help executives focus efforts on areas people care about most. Furthermore, this data-driven methodology can provide ratings for companies without coverage, allowing more socially responsible firms to thrive.
How complex is an Ising model? Usually, this is measured by the computational complexity of its ground state energy problem. Yet, this complexity measure only distinguishes between planar and non-planar interaction graphs, and thus fails to capture properties such as the average node degree, the number of long range interactions, or the dimensionality of the lattice. Herein, we introduce a new complexity measure for Ising models and thoroughly classify Ising models with respect to it. Specifically, given an Ising model we consider the decision problem corresponding to the function graph of its Hamiltonian, and classify this problem in the Chomsky hierarchy. We prove that the language of this decision problem is (i) regular if and only if the Ising model is finite, (ii) constructive context free if and only if the Ising model is linear and its edge language is regular, (iii) constructive context sensitive if and only if the edge language of the Ising model is context sensitive, and (iv) decidable if and only if the edge language of the Ising model is decidable. We apply this theorem to show that the 1d Ising model, the Ising model on generalised ladder graphs, and the Ising model on layerwise complete graphs are constructive context free, while the 2d Ising model, the all-to-all Ising model, and the Ising model on perfect binary trees are constructive context sensitive. This work is a first step in the characterisation of physical interactions in terms of grammars.
Recent progress in deep learning and natural language processing has given rise to powerful models that are primarily trained on a cloze-like task and show some evidence of having access to substantial linguistic information, including some constructional knowledge. This groundbreaking discovery presents an exciting opportunity for a synergistic relationship between computational methods and Construction Grammar research. In this chapter, we explore three distinct approaches to the interplay between computational methods and Construction Grammar: (i) computational methods for text analysis, (ii) computational Construction Grammar, and (iii) deep learning models, with a particular focus on language models. We touch upon the first two approaches as a contextual foundation for the use of computational methods before providing an accessible, yet comprehensive overview of deep learning models, which also addresses reservations construction grammarians may have. Additionally, we delve into experiments that explore the emergence of constructionally relevant information within these models while also examining the aspects of Construction Grammar that may pose challenges for these models. This chapter aims to foster collaboration between researchers in the fields of natural language processing and Construction Grammar. By doing so, we hope to pave the way for new insights and advancements in both these fields.
Given a monotone convex function on the space of essentially bounded random variables with the Lebesgue property (order continuity), we consider its extension preserving the Lebesgue property to as big solid vector space of random variables as possible. We show that there exists a maximum such extension, with explicit construction, where the maximum domain of extension is obtained as a (possibly proper) subspace of a natural Orlicz-type space, characterized by a certain uniform integrability property. As an application, we provide a characterization of the Lebesgue property of monotone convex function on arbitrary solid spaces of random variables in terms of uniform integrability and a "nice" dual representation of the function.
In this paper, we present an open-source parsing environment (Tuebingen Linguistic Parsing Architecture, TuLiPA) which uses Range Concatenation Grammar (RCG) as a pivot formalism, thus opening the way to the parsing of several mildly context-sensitive formalisms. This environment currently supports tree-based grammars (namely Tree-Adjoining Grammars, TAG) and Multi-Component Tree-Adjoining Grammars with Tree Tuples (TT-MCTAG)) and allows computation not only of syntactic structures, but also of the corresponding semantic representations. It is used for the development of a tree-based grammar for German.
In this chapter, we argue that it is highly beneficial for the contemporary construction grammarian to have a thorough understanding of the strong relationship between the research fields of construction grammar and artificial intelligence. We start by unravelling the historical links between the two fields, showing that their relationship is rooted in a common attitude towards human communication and language. We then discuss the first direction of influence, focussing in particular on how insights and techniques from the field of artificial intelligence play an important role in operationalising, validating and scaling constructionist approaches to language. We then proceed to the second direction of influence, highlighting the relevance of construction grammar insights and analyses to the artificial intelligence endeavour of building truly intelligent agents. We support our case with a variety of illustrative examples and conclude that the further elaboration of this relationship will play a key role in shaping the future of the field of construction grammar.
Demystifying complex human-ground interactions is essential for accurate and realistic 3D human motion reconstruction from RGB videos, as it ensures consistency between the humans and the ground plane. Prior methods have modeled human-ground interactions either implicitly or in a sparse manner, often resulting in unrealistic and incorrect motions when faced with noise and uncertainty. In contrast, our approach explicitly represents these interactions in a dense and continuous manner. To this end, we propose a novel Ground-aware Motion Model for 3D Human Motion Reconstruction, named GraMMaR, which jointly learns the distribution of transitions in both pose and interaction between every joint and ground plane at each time step of a motion sequence. It is trained to explicitly promote consistency between the motion and distance change towards the ground. After training, we establish a joint optimization strategy that utilizes GraMMaR as a dual-prior, regularizing the optimization towards the space of plausible ground-aware motions. This leads to realistic and coherent motion reconstruction, irrespective of the assumed or learned ground plane. Through extensive evaluation on the AMASS and AIST++ datasets, our model demonstrates good generalization and discriminating abilities in challenging cases including complex and ambiguous human-ground interactions. The code will be available at https://github.com/xymsh/GraMMaR.
This article presents a construction of the concept of stochastic integration in Riemannian manifolds from a purely functional-analytic point of view. We show that there are infinitely many such integrals, and that any two of them are related by a simple formula. We also find that the Stratonovich and Itô integrals known to probability theorists are two instances of the general concept constructed herein.
Stabler proposes an implementation of the Chomskyan Minimalist Program, Chomsky 95 with Minimalist Grammars - MG, Stabler 97. This framework inherits a long linguistic tradition. But the semantic calculus is more easily added if one uses the Curry-Howard isomorphism. Minimalist Categorial Grammars - MCG, based on an extension of the Lambek calculus, the mixed logic, were introduced to provide a theoretically-motivated syntax-semantics interface, Amblard 07. In this article, we give full definitions of MG with algebraic tree descriptions and of MCG, and take the first steps towards giving a proof of inclusion of their generated languages.
A remarkable result by S. Artstein-Avidan and V. Milman states that, up to pre-composition with affine operators, addition of affine functionals, and multiplication by positive scalars, the only fully order preserving mapping acting on the class of lower semicontinuous proper convex functions defined on $\mathbb{R}^n$ is the identity operator, and the only fully order reversing one acting on the same set is the Fenchel conjugation. Here fully order preserving (reversing) mappings are understood to be those which preserve (reverse) the pointwise order among convex functions, are invertible, and such that their inverses also preserve (reverse) such order. In this paper we establish a suitable extension of these results to order preserving and order reversing operators acting on the class of lower semicontinous proper convex functions defined on arbitrary infinite dimensional Banach spaces.
In this paper we describe our experiences with a tool for the development and testing of natural language grammars called GTU (German: Grammatik-Testumgebumg; grammar test environment). GTU supports four grammar formalisms under a window-oriented user interface. Additionally, it contains a set of German test sentences covering various syntactic phenomena as well as three types of German lexicons that can be attached to a grammar via an integrated lexicon interface. What follows is a description of the experiences we gained when we used GTU as a tutoring tool for students and as an experimental tool for CL researchers. From these we will derive the features necessary for a future grammar workbench.
Existing syntactic grammars of natural languages, even with a far from complete coverage, are complex objects. Assessments of the quality of parts of such grammars are useful for the validation of their construction. We evaluated the quality of a grammar of French determiners that takes the form of a recursive transition network. The result of the application of this local grammar gives deeper syntactic information than chunking or information available in treebanks. We performed the evaluation by comparison with a corpus independently annotated with information on determiners. We obtained 86% precision and 92% recall on text not tagged for parts of speech.
Sequential Constraint Grammar (SCG) (Karlsson, 1990) and its extensions have lacked clear connections to formal language theory. The purpose of this article is to lay a foundation for these connections by simplifying the definition of strings processed by the grammar and by showing that Nonmonotonic SCG is undecidable and that derivations similar to the Generative Phonology exist. The current investigations propose resource bounds that restrict the generative power of SCG to a subset of context sensitive languages and present a strong finite-state condition for grammars as wholes. We show that a grammar is equivalent to a finite-state transducer if it is implemented with a Turing machine that runs in o(n log n) time. This condition opens new finite-state hypotheses and avenues for deeper analysis of SCG instances in the way inspired by Finite-State Phonology.
We study square integrable functions on the metaplectic group and functions on the space of unitary symmetric matrices. We relate them using the oscillator representations.
The present level of proliferation of fake, biased, and propagandistic content online has made it impossible to fact-check every single suspicious claim or article, either manually or automatically. Thus, many researchers are shifting their attention to higher granularity, aiming to profile entire news outlets, which makes it possible to detect likely "fake news" the moment it is published, by simply checking the reliability of its source. Source factuality is also an important element of systems for automatic fact-checking and "fake news" detection, as they need to assess the reliability of the evidence they retrieve online. Political bias detection, which in the Western political landscape is about predicting left-center-right bias, is an equally important topic, which has experienced a similar shift towards profiling entire news outlets. Moreover, there is a clear connection between the two, as highly biased media are less likely to be factual; yet, the two problems have been addressed separately. In this survey, we review the state of the art on media profiling for factuality and bias, arguing for the need to model them jointly. We further discuss interesting recent advances in using different information sources and modalities, which go beyond the text of the articles the target news outlet has published. Finally, we discuss current challenges and outline future research directions.
We present a study on predicting the factuality of reporting and bias of news media. While previous work has focused on studying the veracity of claims or documents, here we are interested in characterizing entire news media. These are under-studied but arguably important research problems, both in their own right and as a prior for fact-checking systems. We experiment with a large list of news websites and with a rich set of features derived from (i) a sample of articles from the target news medium, (ii) its Wikipedia page, (iii) its Twitter account, (iv) the structure of its URL, and (v) information about the Web traffic it attracts. The experimental results show sizable performance gains over the baselines, and confirm the importance of each feature type.
Predicting the political bias and the factuality of reporting of entire news outlets are critical elements of media profiling, which is an understudied but an increasingly important research direction. The present level of proliferation of fake, biased, and propagandistic content online, has made it impossible to fact-check every single suspicious claim, either manually or automatically. Alternatively, we can profile entire news outlets and look for those that are likely to publish fake or biased content. This approach makes it possible to detect likely "fake news" the moment they are published, by simply checking the reliability of their source. From a practical perspective, political bias and factuality of reporting have a linguistic aspect but also a social context. Here, we study the impact of both, namely (i) what was written (i.e., what was published by the target medium, and how it describes itself on Twitter) vs. (ii) who read it (i.e., analyzing the readers of the target medium on Facebook, Twitter, and YouTube). We further study (iii) what was written about the target medium on Wikipedia. The evaluation results show that what was written matters most, and that putting all information sources together yields huge improvements over the current state-of-the-art.
Large Language Models (LLMs) have demonstrated remarkable success as general-purpose task solvers across various fields. However, their capabilities remain limited when addressing domain-specific problems, particularly in downstream NLP tasks. Research has shown that models fine-tuned on instruction-based downstream NLP datasets outperform those that are not fine-tuned. While most efforts in this area have primarily focused on resource-rich languages like English and broad domains, little attention has been given to multilingual settings and specific domains. To address this gap, this study focuses on developing a specialized LLM, LlamaLens, for analyzing news and social media content in a multilingual context. To the best of our knowledge, this is the first attempt to tackle both domain specificity and multilinguality, with a particular focus on news and social media. Our experimental setup includes 18 tasks, represented by 52 datasets covering Arabic, English, and Hindi. We demonstrate that LlamaLens outperforms the current state-of-the-art (SOTA) on 23 testing sets, and achieves comparable performance on 8 sets. We make the models and resources publicly available for the research community (https://huggingface.co/collections/QCRI/llamalens-672f7e0604a0498c6a2f0fe9).
In the context of fake news, bias, and propaganda, we study two important but relatively under-explored problems: (i) trustworthiness estimation (on a 3-point scale) and (ii) political ideology detection (left/right bias on a 7-point scale) of entire news outlets, as opposed to evaluating individual articles. In particular, we propose a multi-task ordinal regression framework that models the two problems jointly. This is motivated by the observation that hyper-partisanship is often linked to low trustworthiness, e.g., appealing to emotions rather than sticking to the facts, while center media tend to be generally more impartial and trustworthy. We further use several auxiliary tasks, modeling centrality, hyperpartisanship, as well as left-vs.-right bias on a coarse-grained scale. The evaluation results show sizable performance gains by the joint models over models that target the problems in isolation.
We present MediaSpin, a large-scale language resource capturing how major news outlets modify headlines after publication, and MediaSpin-in-the-Wild, a complementary dataset linking these revised headlines to their downstream engagement on social media. The increasing editability of online news headlines offers new opportunities to study linguistic framing and bias through the lens of editorial revisions. The dataset contains 78,910 headline pairs annotated for 13 types of media bias, grounded in established media-bias taxonomies, covering both subjective (e.g., sensationalism, spin) and objective (e.g., omission, slant) forms, with annotation conducted through a human-supervised large-language-model pipeline with expert validation and quality control. We describe the annotation schema and demonstrate three downstream applications: (1) cross-national analysis of how country references are added or removed during editing, (2) transformer-based bias classification at both binary and fine-grained levels, and (3) behavioral analysis of biased headlines on X (Twitter) using 180,786 news-related tweets from 819 consenting users. The results reveal regional asymmetries in representational framing, measurable linguistic markers, and consistently higher engagement with biased content. MediaSpin and MediaSpin-in-the-Wild together provide a reproducible benchmark for bias detection and the study of editorial and behavioral dynamics in contemporary media ecosystems.
Let $(X, f)$ be a topological dynamical system and $\mathcal {F}$ be a Furstenberg family (a collection of subsets of $\mathbb{N}$ with hereditary upward property). A point $x\in X$ is called an $\mathcal {F}$-transitive point if for every non-empty open subset $U$ of $X$ the entering time set of $x$ into $U$, $\{n\in \mathbb{N}: f^{n}(x) \in U\}$, is in $\mathcal {F}$; the system $(X,f)$ is called $\mathcal {F}$-point transitive if there exists some $\mathcal {F}$-transitive point. In this paper, we first discuss the connection between $\mathcal {F}$-point transitivity and $\mathcal {F}$-transitivity, and show that weakly mixing and strongly mixing systems can be characterized by $\mathcal {F}$-point transitivity, completing results in [Transitive points via Furstenberg family, Topology Appl. 158 (2011), 2221--2231]. We also show that multi-transitivity, $Δ$-transitivity and multi-minimality can also be characterized by $\mathcal {F}$-point transitivity, answering two questions proposed by Kwietniak and Oprocha [On weak mixing, minimality and weak disjointness of all iterates, Erg. Th. Dynam. Syst., 32 (2012), 1661--1672].
Extending Martín Escardó's effectful forcing technique, we give a new proof of a well-known result: Brouwer's monotone bar theorem holds for any bar that can be realized by a functional of type $(\mathbb{N} \to \mathbb{N}) \to \mathbb{N}$ in Gödel's System T. Effectful forcing is an elementary alternative to standard sheaf-theoretic forcing arguments, using ideas from programming languages, including computational effects, monads, the algebra interpretation of call-by-name $λ$-calculus, and logical relations. Our argument proceeds by interpreting System T programs as well-founded dialogue trees whose nodes branch on a query to an oracle of type $\mathbb{N}\to\mathbb{N}$, lifted to higher type along a call-by-name translation. To connect this interpretation to the bar theorem, we then show that Brouwer's famous "mental constructions" of barhood constitute an invariant form of these dialogue trees in which queries to the oracle are made maximally and in order.
We propose a categorial grammar based on classical multiplicative linear logic. This can be seen as an extension of abstract categorial grammars (ACG) and is at least as expressive. However, constituents of {\it linear logic grammars (LLG)} are not abstract $λ$-terms, but simply tuples of words with labeled endpoints, we call them {\it multiwords}. At least, this gives a concrete and intuitive representation of ACG. A key observation is that the class of multiwords has a fundamental algebraic structure. Namely, multiwords can be organized in a category, very similar to the category of topological cobordisms. This category is symmetric monoidal closed and compact closed and thus is a model of linear $λ$-calculus and classical linear logic. We think that this category is interesting on its own right. In particular, it might provide categorical representation for other formalisms. On the other hand, many models of language semantics are based on commutative logic or, more generally, on symmetric monoidal closed categories. But the category of {\it word cobordisms} is a category of language elements, which is itself symmetric monoidal closed and independent of any grammar. Thus, it might prove useful in understanding language semantics as well.
合并后,这批文献可形成一套支持“Fairclough三维模型下新闻话语批评”的多层计算证据框架:在**社会实践层**,利用社交媒体网络与扩散动力学识别失真信息/宣传及其行动者机制,同时用政治化与操纵相关方法刻画群体话语动态;在**媒体/话语再生产层**,通过事实性与偏差预测(含跨语言/多来源信号)把“文本—立场”转化为可量化参照,并用可复现的社媒实验/工程基础设施保障证据质量;在**文本/语篇层**,引入行动者指称识别、话语结构树表征、以及话语标记与话语关系信号的自动识别与建模,为衔接、关系类型与隐含立场运作提供可操作的语言学证据。另有部分形式语言/语法与解析理论作为符号化文本结构表征的工具箱,以及少量跨学科非直接相关背景被并列剔除以避免分组交叉。