I'm a bit of an eclectic mess 🙂 I've been a programmer, journalist, editor, TV producer, and a few other things.
I'm currently working on my second novel which is complete, but is in the edit stage. I wrote my first novel over 20 years ago but then didn't write much till now.
"In-situ Water quality monitoring in Oil and Gas operations. (arXiv:2301.08800v1 [cs.CV])" — a model designed to enable users to determine contamination levels in water bodies with weak reflectance patterns such as small ponds based on satellite images.
"Visual Semantic Relatedness Dataset for Image Captioning. (arXiv:2301.08784v1 [cs.CL])" — A textual visual context dataset for captioning, in which the publicly available dataset COCO Captions has been extended with information about the scene (such as objects in the image).
This particular #StableDiffusion prompt based on Terry Pratchett novel titles was in the works for a few days — I just wasn't sure about the results since most of them were fairly similar ...
The prompt? "Wintersmith"
Predictably, most of the results were people in winter-wear. I just didn't like the monotony and hence the inclusion of the fox from what looks like the cover of a box of pencils 😛
"Model Complexity-Accuracy Trade-off for a Convolutional Neural Network. (arXiv:1705.03338v1 [cs.CV] CROSS LISTED)" — A study of the model complexity versus accuracy trade-off on MNSIT dataset, providing a concrete framework for handling such a problem, given the worst case accuracy that a system can tolerate.
"MemeTector: Enforcing deep focus for meme detection. (arXiv:2205.13268v2 [cs.CV] UPDATED)" — A methodology that utilizes the visual part of image memes as instances of the regular image class and the initial image memes as instances of the image meme class to force the model to concentrate on the critical parts that characterize an image meme.
"Learning Sequential Latent Variable Models from Multimodal Time Series Data. (arXiv:2204.10419v2 [cs.LG] UPDATED)" — A self-supervised generative modelling framework to jointly learn a probabilistic latent state representation of multimodal data and the respective dynamics to improve prediction and representation quality.
"REx: Data-Free Residual Quantization Error Expansion. (arXiv:2203.14645v2 [cs.CV] UPDATED)" — A quantization method that leverages residual error expansion, along with group sparsity and an ensemble approximation for better parallelization.
"Novel-View Acoustic Synthesis. (arXiv:2301.08730v1 [cs.CV])" — Given the sight and sound observed at a source viewpoint, synthesizing the *sound* of that scene from an unseen target viewpoint using a neural rendering approach.
"Visual Writing Prompts: Character-Grounded Story Generation with Curated Image Sequences. (arXiv:2301.08571v1 [cs.CL])" — A new image-grounded dataset for improving visual story generation due to the fact that existing image sequence collections do not have coherent plots behind them.
"When Source-Free Domain Adaptation Meets Label Propagation. (arXiv:2301.08413v1 [cs.CV])" — An approach that tries to achieve efficient feature clustering from the perspective of label propagation by dividing the target data into inner and outlier samples based on the adaptive threshold of the learning state, and applying a customized learning strategy to best fits the data property.
"Open-Set Likelihood Maximization for Few-Shot Learning. (arXiv:2301.08390v1 [cs.CV])" — A generalization of the maximum likelihood principle, in which latent scores down-weighing the influence of potential outliers are introduced alongside the usual parametric model. This implementation can be applied on top of any pre-trained model seamlessly.