视频增强，视频超分

本报告将视频增强与超分领域的研究归纳为十个维度。技术路线经历了从经典CNN传播机制到Transformer/Mamba长程建模，再到扩散模型生成式重建的演进；应用场景从通用增强扩展到卫星遥感、人脸修复及Raw域多帧融合等垂直领域；在工程实现上，则兼顾了时空同步超分、盲退化泛化、轻量化部署以及与视频编码标准的深度融合。整体呈现出从纯像素重建向感知增强、从固定倍数向任意尺度、从实验环境向真实复杂退化场景转化的趋势。

共 150 篇文献，10 个研究方向

经典视频超分架构：传播、对齐与循环机制

该组文献专注于视频超分（VSR）的基础骨干网络设计，探讨如何通过变形卷积（DCN）、隐式对齐、双向循环神经网络（RNN）及改进的U-Net结构实现高效的时空特征聚合。相关文献: Kelvin C. K. Chan et. al, 2020 等 18 篇文献

前沿长程建模：Transformer 与 Mamba 架构优化

利用Transformer的自注意力机制和Mamba的线性序列建模能力，解决传统CNN在处理大尺度运动对齐和长程时空依赖方面的局限性，提升全局一致性。相关文献: Jingyun Liang et. al, 2022 等 19 篇文献

生成式视频增强：基于扩散模型与 GAN 的高保真重建

涵盖最新的利用生成式扩散模型（Diffusion Models）和GAN进行视频修复的研究，重点解决纹理生成、单步采样效率以及生成过程中的时空闪烁一致性问题。相关文献: Zheng Chen et. al, 2025 等 18 篇文献

时空联合超分 (ST-VSR) 与任意尺度重采样

探讨将空间超分与时间插帧（VFI）结合的单阶段任务，以及利用隐式神经表示（INR）实现任意空间尺度、任意时间点的连续视频重建技术。相关文献: Xiaoyu Xiang et. al, 2020 等 18 篇文献

盲视频超分与真实世界复杂退化处理

针对现实场景中未知的模糊核、噪声及压缩伪影，通过自监督学习、退化建模、对比学习或元学习策略提升模型在“In-the-wild”视频中的泛化性。相关文献: Haoran Bai et. al, 2022 等 17 篇文献

轻量化设计与边缘端实时增强

侧重于在移动端或资源受限设备上的工程化落地，通过重参数化（Reparameterization）、剪枝、亚像素卷积和高效滑动窗口设计优化推理速度。相关文献: Linlin Ou et. al, 2023 等 12 篇文献

领域驱动应用：遥感、卫星、人脸与医疗影像

针对特定垂直领域的特殊挑战（如卫星视频的微小目标、人脸的身份保持、医疗内窥镜的精密要求），结合辅助信息或先验知识进行的定制化增强。

多模态引导与多任务联合恢复

利用事件相机（Event Camera）、深度图等辅助传感器信息，或同时处理去噪、去模糊、去压缩伪影等多种退化任务的综合性方案。

Raw 域/突发图像 (Burst) 超分与多帧融合

专注于处理相机原始（Raw）数据，利用多帧突发图像间的子像素位移进行细节恢复，强调噪声建模与Raw-to-RGB的联合转换。相关文献: Goutam Bhat et. al, 2021 等 10 篇文献

传统数学模型、优化方法与编码增强

包含基于正则化、稀疏表示、变分法等经典数学工具的方法，以及与视频编码标准（H.265/VVC）结合的预处理或后处理技术。相关文献: Yinhao Li et. al, 2020 等 12 篇文献

总计221篇相关文献

Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network

基于高效子像素卷积神经网络的实时单图像和视频超分辨率

Wenzhe Shi, Jose Caballero, Ferenc Huszár 等, 2016-2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Recently, several models based on deep neural networks have achieved great success in terms of both reconstruction accuracy and computational performance for single image super-resolution. In these methods, the low resolution (LR) input image is upscaled to the high resolution (HR) space using a single filter, commonly bicubic interpolation, before reconstruction. This means that the super-resolution (SR) operation is performed in HR space. We demonstrate that this is sub-optimal and adds computational complexity. In this paper, we present the first convolutional neural network (CNN) capable of real-time SR of 1080p videos on a single K2 GPU. To achieve this, we propose a novel CNN architecture where the feature maps are extracted in the LR space. In addition, we introduce an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output. By doing so, we effectively replace the handcrafted bicubic filter in the SR pipeline with more complex upscaling filters specifically trained for each feature map, whilst also reducing the computational complexity of the overall SR operation. We evaluate the proposed approach using images and videos from publicly available datasets and show that it performs significantly better (+0.15dB on Images and +0.39dB on Videos) and is an order of magnitude faster than previous CNN-based methods.