I'm a bit of an eclectic mess ๐ I've been a programmer, journalist, editor, TV producer, and a few other things.
I'm currently working on my second novel which is complete, but is in the edit stage. I wrote my first novel over 20 years ago but then didn't write much till now.
"Deep Learning for Identifying Iran's Cultural Heritage Buildings in Need of Conservation Using Image Classification and Grad-CAM. (arXiv:2302.14354v1 [cs.CV])" โ Using machine learning to identify damage and defects to cultural heritage buildings using Convolutional Neural Networks (CNN).
"Accuracy and Fidelity Comparison of Luna and DALL-E 2 Diffusion-Based Image Generation Systems. (arXiv:2301.01914v2 [cs.CV] UPDATED)" โ Comparing the accuracy and fidelity of images generated by DALL-E 2 and Luna, which is Stable Diffusion-based.
"Diffusion Posterior Sampling for General Noisy Inverse Problems. (arXiv:2209.14687v3 [stat.ML] UPDATED)" โ Extending diffusion solvers to efficiently handle general noisy (non)linear inverse problems via approximation of the posterior sampling.
"Subspace Diffusion Generative Models. (arXiv:2205.01490v2 [cs.LG] UPDATED)" โ Restricting diffusion via projections onto subspaces to reduce computational time and cost without affecting the overall quality of the generated image.
"Large Scale Visual Food Recognition. (arXiv:2103.16107v3 [cs.CV] UPDATED)" โ A food dataset with 2,000 categories and over 1 million images that can be used for food recognition.
"Directed Diffusion: Direct Control of Object Placement through Attention Guidance. (arXiv:2302.13153v1 [cs.CV])" โ Controlling object placement in diffusion models by way of attention guidance.
"In What Languages are Generative Language Models the Most Formal? Analyzing Formality Distribution across Languages" โ Measuring the formality of the generated text for different languages using multilingual generative language models.
"ISS: Image as Stepping Stone for Text-Guided 3D Shape Generation. (arXiv:2209.04145v6 [cs.CV] UPDATED)" โ Using 2D images as a stepping stone for creating 3D shapes and eliminating the need for paired text-shape data.
"Modulating Pretrained Diffusion Models for Multimodal Image Synthesis. (arXiv:2302.12764v1 [cs.CV])" โ Multimodal Conditioning Modules (MCM) for enabling conditional image synthesis using pretrained diffusion models so that you can generate images using not just a text prompt, but additional input such as a segmentation map or a sketch.
"Surface Recognition for e-Scooter Using Smartphone IMU Sensor. (arXiv:2302.12720v1 [eess.SP])" โ Detecting whether an e-scooter is on a paved road or a sidewalk using the Inertial Measurement Unit (IMU) sensors on a smartphone.
"ZITS++: Image Inpainting by Improving the Incremental Transformer on Structural Priors. (arXiv:2210.05950v2 [cs.CV] UPDATED)" โ Better image inpainting by detecting structures in the source image using techniques such as edge detection.
"Designing an Encoder for Fast Personalization of Text-to-Image Models. (arXiv:2302.12228v1 [cs.CV])" โ A method to teach text-to-image models new concepts in seconds.
"Aligning Text-to-Image Models using Human Feedback. (arXiv:2302.12192v1 [cs.LG])" โ A fine-tuning method for better aligning generated images to the input text prompt when using diffusion models.
"Evaluating the Efficacy of Skincare Product: A Realistic Short-Term Facial Pore Simulation. (arXiv:2302.11950v1 [cs.CV])" โ Simulating the effects of skincare products on your skin (specifically the pores) to gauge efficacy of the product.
"Region-Aware Diffusion for Zero-shot Text-driven Image Editing. (arXiv:2302.11797v1 [cs.CV])" โ A region-aware text-guided image editing method which aims to replace one entity with another.
What I always wonder with these approaches is whether you can replace a larger entity with a smaller one, or vice versa, (say a horse with a cat) in a way that looks realistic?
"Controlled and Conditional Text to Image Generation with Diffusion Prior. (arXiv:2302.11710v1 [cs.CV])" โ Using a Diffusion Prior to constrain the generation to a specific domain without altering the larger Diffusion Decoder in a memory and compute efficient way.
"Composer: Creative and Controllable Image Synthesis with Composable Conditions. (arXiv:2302.09778v2 [cs.CV] UPDATED)" โ A way to flexibly control the output image from diffusion models to modify the layout or style of the final image.
"Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities. (arXiv:2302.11154v1 [cs.CV])" โ Creating a non-task/domain specific, general visual recognition model.