I'm a bit of an eclectic mess 🙂 I've been a programmer, journalist, editor, TV producer, and a few other things.
I'm currently working on my second novel which is complete, but is in the edit stage. I wrote my first novel over 20 years ago but then didn't write much till now.
"Region-Aware Diffusion for Zero-shot Text-driven Image Editing. (arXiv:2302.11797v1 [cs.CV])" — A region-aware text-guided image editing method which aims to replace one entity with another.
What I always wonder with these approaches is whether you can replace a larger entity with a smaller one, or vice versa, (say a horse with a cat) in a way that looks realistic?
"Controlled and Conditional Text to Image Generation with Diffusion Prior. (arXiv:2302.11710v1 [cs.CV])" — Using a Diffusion Prior to constrain the generation to a specific domain without altering the larger Diffusion Decoder in a memory and compute efficient way.
"Composer: Creative and Controllable Image Synthesis with Composable Conditions. (arXiv:2302.09778v2 [cs.CV] UPDATED)" — A way to flexibly control the output image from diffusion models to modify the layout or style of the final image.
"Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities. (arXiv:2302.11154v1 [cs.CV])" — Creating a non-task/domain specific, general visual recognition model.
"'The Taurus': Cattle Breeds & Diseases Identification Mobile Application using Machine Learning. (arXiv:2302.10920v1 [cs.LG])" — A cross-platform mobile application to identify cattle breeds, easily analyze and identify the diseases which cattle suffer from, and to provide solutions the identified diseases.
"Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness. (arXiv:2302.10893v1 [cs.LG])" — Reducing bias in generative text-to-image models based on instructions.
"Learning 3D Photography Videos via Self-supervised Diffusion on Single Images. (arXiv:2302.10781v1 [cs.CV])" — Transforming static images into videos with additional effects using a diffusion model to handle the inpainting.
"RealFusion: 360{\deg} Reconstruction of Any Object from a Single Image. (arXiv:2302.10663v1 [cs.CV])" — Creating a 360-degree photographic model of an object from a single image of it by fitting a neural radiance field to the image.
"Diffusion Models and Semi-Supervised Learners Benefit Mutually with Few Labels. (arXiv:2302.10586v1 [cs.CV])" — A three-stage training strategy for conditional image generation and classification in semi-supervised learning.
"A Comparative Analysis of CNN-Based Pretrained Models for the Detection and Prediction of Monkeypox. (arXiv:2302.10277v1 [cs.CV])" — Using Convolutional Neural Networks (CNN) to detect monkeypox since it's difficult to diagnose early due to its similarity to other diseases like chickenpox and measles.
"Vulnerability analysis of captcha using Deep learning. (arXiv:2302.09389v1 [cs.CR])" — Using a Convolutional Neural Network (CNN) model to predict text-based CAPTCHAs to examine the flaws inherent in the system and to create more resilient CAPTCHAs.
"Exploring the Representation Manifolds of Stable Diffusion Through the Lens of Intrinsic Dimension. (arXiv:2302.09301v1 [cs.CL])" — An investigation into the basic geometric properties induced by prompts in Stable Diffusion and how this impact depends on the layer being considered.
"Web Photo Source Identification based on Neural Enhanced Camera Fingerprint. (arXiv:2302.09228v1 [cs.CV])" — Using a neural network to identify sensor patterns in an effort to identify the source camera for images published on the web.
"A Pilot Evaluation of ChatGPT and DALL-E 2 on Decision Making and Spatial Reasoning. (arXiv:2302.09068v1 [cs.AI])" — An evaluation of ChatGPT and DALL-E2 to assess the spatial reasoning and decision making abilities of each model.
""Help Me Help the AI": Understanding How Explainability Can Support Human-AI Interaction. (arXiv:2210.03735v2 [cs.HC] UPDATED)" — A study of how explainability can support human-AI interaction using a real-world AI applicaiton.
"Which country is this picture from? New data and methods for DNN-based country recognition. (arXiv:2209.02429v2 [cs.CV] UPDATED)" — A framework to identify the country where an image was taken, which could be useful in debunking fake news and many other applications.