Certainly! Beyond CNNs and ViTs, there have been other novel approaches and modules in computer vision that don't directly extend these architectures. Here are some:
1. **Capsule Networks (CapsNets)**:
- Proposed by Geoffrey Hinton, they introduce the concept of "capsules" which are small groups of neurons that capture specific patterns and spatial hierarchies.
- Designed to address the shortcomings of CNNs, especially in terms of spatial relationships and viewpoint variations.
2. **Neural Ordinary Differential Equations (Neural ODEs)**:
- Instead of having discrete layers, Neural ODEs model the transformation of data as a continuous process.
- They use ODE solvers to find the final state of the network, which can be seen as the output after an infinite number of infinitesimally small transformations.
3. **Dynamic Vision Sensor (DVS)**:
- Not a network architecture, but a novel type of sensor that captures changes in pixel intensity rather than absolute intensities.
- It results in sparse, event-driven data which has led to the development of new algorithms and network architectures specifically for event-driven data.
4. **Spatial Transformer Networks**:
- Allows neural networks to spatially transform feature maps selectively, giving the network the ability to zoom, rotate, and skew the features as needed.
5. **HyperNetworks**:
- A network (HyperNet) that generates weights for another network. Useful for meta-learning and situations where the model needs to rapidly adapt to new tasks.
6. **Differentiable Neural Computers (DNC)**:
- Combines neural networks with external memory resources, allowing them to learn algorithms and perform tasks that traditional networks struggle with, like sorting.
7. **SIREN (Implicit Neural Representations with Periodic Activation Functions)**:
- Uses periodic activation functions to represent complex functions and signals, showing strong performance in tasks like image reconstruction and 3D shape representation.
8. **Transformers in Vision without Attention**:
- While transformers are known for attention mechanisms, there are efforts to use transformer architectures without attention, relying on feed-forward mechanisms and other strategies.
2301.02240.pdf (arxiv.org)
9. **Neural Radiance Fields (NeRF)**:
- A method to represent 3D scenes using continuous volumetric scene functions, particularly useful for view synthesis.
10. **PointNets**:
- Designed for point cloud processing. Instead of voxels or multi-view images, PointNets directly process point clouds, making them useful for 3D vision tasks.
While many of the recent innovations are influenced by the success of CNNs and ViTs, researchers are continuously exploring entirely new paradigms and mechanisms to push the boundaries of what's possible in computer vision.
'버츄얼유튜버' 카테고리의 다른 글
변분추론, Layer-wise Training과 E2E 비교에서 본 (1) | 2024.09.03 |
---|---|
군론 (1) | 2024.02.16 |
[확통] 2 (0) | 2023.06.12 |
[Detection] 드론 뷰 오브젝트 디텍션 (0) | 2023.06.07 |
[Optical flow] GradScale (0) | 2023.06.04 |