Posts
1411
Following
142
Followers
868
I'm a bit of an eclectic mess 🙂 I've been a programmer, journalist, editor, TV producer, and a few other things.

I'm currently working on my second novel which is complete, but is in the edit stage. I wrote my first novel over 20 years ago but then didn't write much till now.

I post about #Coding, #Flutter, #Writing, #Movies and #TV. I'll also talk about #Technology, #Gadgets, #MachineLearning, #DeepLearning and a few other things as the fancy strikes ...

Lived in: 🇱🇰🇸🇦🇺🇸🇳🇿🇸🇬🇲🇾🇦🇪🇫🇷🇪🇸🇵🇹🇶🇦🇨🇦

Fahim Farook

"Diffusion-based Image Translation using Disentangled Style and Content Representation. (arXiv:2209.15264v2 [cs.CV] UPDATED)" — A novel diffusion-based unsupervised image translation method using disentangled style and content representation inspired by the splicing Vision Transformer.

Paper: http://arxiv.org/abs/2209.15264
Code: https://github.com/cyclomon/DiffuseIT

#AI #CV #NewPaper #DeepLearning #MachineLearning

<<Find this useful? Please boost so that others can benefit too 🙂>>
Qualitative comparison of text-…
0
1
0

Fahim Farook

"Stable Target Field for Reduced Variance Score Estimation in Diffusion Models. (arXiv:2302.00670v1 [cs.LG])" — A method to improve diffusion models by by reducing the variance of the training targets in their denoising score-matching objective. This is achieved by incorporating a reference batch which is used to calculate weighted conditional scores as more stable training targets.

Paper: http://arxiv.org/abs/2302.00670
Code: https://github.com/Newbeeer/stf

#AI #CV #NewPaper #DeepLearning #MachineLearning

<<Find this useful? Please boost so that others can benefit too 🙂>>
Illustration of differences bet…
0
1
0

Fahim Farook

"Continuous U-Net: Faster, Greater and Noiseless. (arXiv:2302.00626v1 [cs.CV])" — A novel family of networks for image segmentation which is a continuous deep neural network that introduces new dynamic blocks modelled by second order ordinary differential equations to overcome some of the limitations in current U-Net architectures.

Paper: http://arxiv.org/abs/2302.00626

#AI #CV #NewPaper #DeepLearning #MachineLearning

<<Find this useful? Please boost so that others can benefit too 🙂>>
Visual comparison of our contin…
0
2
1

Fahim Farook

"Inching Towards Automated Understanding of the Meaning of Art: An Application to Computational Analysis of Mondrian's Artwork. (arXiv:2302.00594v1 [cs.CV])" — An attempt to identify capabilities that are related to semantic processing, a current limitation of Deep Neural Networks (DNN), which identifies the missing capabilities by comparing the process of understanding Mondrian's paintings with the process of understanding electronic circuit designs, another creative problem solving instance.

Paper: http://arxiv.org/abs/2302.00594

#AI #CV #NewPaper #DeepLearning #MachineLearning

<<Find this useful? Please boost so that others can benefit too 🙂>>
Possible cognitive architecture…
0
1
1

Fahim Farook

"EfficientRep:An Efficient Repvgg-style ConvNets with Hardware-aware Neural Network Design. (arXiv:2302.00386v1 [cs.CV])" — A hardware-efficient architecture of convolutional neural network, which has a repvgg-like architecture which is high-computation hardware(e.g. GPU) friendly.

Paper: http://arxiv.org/abs/2302.00386

#AI #CV #NewPaper #DeepLearning #MachineLearning

<<Find this useful? Please boost so that others can benefit too 🙂>>
Design of EfficientRep
0
1
0

Fahim Farook

"Alphazzle: Jigsaw Puzzle Solver with Deep Monte-Carlo Tree Search. (arXiv:2302.00384v1 [cs.CV])" — A reassembly algorithm based on single-player Monte Carlo Tree Search (MCTS) which shows the importance of MCTS and the neural networks working together.

Paper: http://arxiv.org/abs/2302.00384

#AI #CV #NewPaper #DeepLearning #MachineLearning

<<Find this useful? Please boost so that others can benefit too 🙂>>
Example of a jigsaw puzzle task…
0
1
0

Fahim Farook

"Detection of Tomato Ripening Stages using Yolov3-tiny. (arXiv:2302.00164v1 [cs.CV])" — A computer vision system to detect tomatoes at different ripening stages by using a neural network-based model for tomato classification and detection.

Paper: http://arxiv.org/abs/2302.00164

#AI #CV #NewPaper #DeepLearning #MachineLearning

<<Find this useful? Please boost so that others can benefit too 🙂>>
Sample dataset images
0
1
0

Fahim Farook

"Real Estate Property Valuation using Self-Supervised Vision Transformers. (arXiv:2302.00117v1 [cs.CV])" — A new method for property valuation that utilizes self-supervised vision transformers and hedonic pricing models trained on real estate data to estimate the value of a given property.

Paper: http://arxiv.org/abs/2302.00117

#AI #CV #NewPaper #DeepLearning #MachineLearning

<<Find this useful? Please boost so that others can benefit too 🙂>>
Some sample images from a repre…
0
1
0

Fahim Farook

"Debiasing Vision-Language Models via Biased Prompts. (arXiv:2302.00070v1 [cs.LG])" — A general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding by debiasing only the text embedding with a calibrated projection matrix to yield robust classifiers and fair generative models.

Paper: http://arxiv.org/abs/2302.00070
Code: https://github.com/chingyaoc/debias_vl

#AI #CV #NewPaper #DeepLearning #MachineLearning

<<Find this useful? Please boost so that others can benefit too 🙂>>
Bias of Generative Models. The …
0
1
0

Fahim Farook

"NASiam: Efficient Representation Learning using Neural Architecture Search for Siamese Networks. (arXiv:2302.00059v1 [cs.CV])" — A novel approach that uses differentiable NAS to improve the multilayer perceptron projector and predictor (encoder/predictor pair) architectures inside siamese-networks-based contrastive learning frameworks (e.g., SimCLR, SimSiam, and MoCo) while preserving the simplicity of previous baselines.

Paper: http://arxiv.org/abs/2302.00059

#AI #CV #NewPaper #DeepLearning #MachineLearning

<<Find this useful? Please boost so that others can benefit too 🙂>>
Layout of the NASiam architectu…
0
1
0

Fahim Farook

"Video Influencers: Unboxing the Mystique. (arXiv:2012.12311v2 [cs.LG] UPDATED)" — A study and analysis of YouTube influencers and their unstructured video data across text, audio and images using a novel "interpretable deep learning" framework to determine the effectiveness of their constituent elements in explaining video engagement.

Paper: http://arxiv.org/abs/2012.12311

#AI #CV #NewPaper #DeepLearning #MachineLearning

<<Find this useful? Please boost so that others can benefit too 🙂>>
Gradient heat map in video fram…
0
1
0

Fahim Farook

"Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models. (arXiv:2301.13826v1 [cs.CV])" — A process which intervenes in the generative process of diffusion models on the fly during inference time to improve the faithfulness of the generated images to guide the model to refine the cross-attention units to attend to all subject tokens in the text prompt and strengthen - or excite - their activations, encouraging the model to generate all subjects described in the text prompt.

Paper: http://arxiv.org/abs/2301.13826
Code: https://github.com/AttendAndExcite/Attend-and-Excite

#AI #CV #NewPaper #DeepLearning #MachineLearning

<<Find this useful? Please boost so that others can benefit too 🙂>>
Given a pre-trained text-to-ima…
0
1
0

Fahim Farook

"Grounding Language Models to Images for Multimodal Generation. (arXiv:2301.13823v1 [cs.CL])" — An efficient method to ground pretrained text-only language models to the visual domain, enabling them to process and generate arbitrarily interleaved image-and-text data. This approach apparently works with any off-the-shelf language model.

Paper: http://arxiv.org/abs/2301.13823

#AI #CV #NewPaper #DeepLearning #MachineLearning

<<Find this useful? Please boost so that others can benefit too 🙂>>
Our method grounds a language m…
1
1
0

Fahim Farook

"UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers. (arXiv:2301.13741v1 [cs.CV])" — A universal vison-language Transformer compression framework which can handle multiple generative and discriminative vision-language tasks such as Visual Reasoning, Image Caption, Visual Question Answer, Image-Text Retrieval, Text-Image Retrieval, and Image Classification.

Paper: http://arxiv.org/abs/2301.13741

#AI #CV #NewPaper #DeepLearning #MachineLearning

<<Find this useful? Please boost so that others can benefit too 🙂>>
Comparison between the Mask-bas…
0
2
1

Fahim Farook

"DisDiff: Unsupervised Disentanglement of Diffusion Probabilistic Models. (arXiv:2301.13721v1 [cs.CV])" — A new task to take advantage of the remarkable modeling ability of diffusion probabilistic models (DPM) using an unsupervised approach.

Paper: http://arxiv.org/abs/2301.13721

#AI #CV #NewPaper #DeepLearning #MachineLearning

<<Find this useful? Please boost so that others can benefit too 🙂>>
Illustration of disentanglement…
0
3
1

Fahim Farook

"Learning Data Representations with Joint Diffusion Models. (arXiv:2301.13622v1 [cs.LG])" — A joint diffusion model that simultaneously learns meaningful internal representations fit for both generative and predictive tasks and which has superior performance across various tasks, including generative modeling, semi-supervised classification, and domain adaptation.

Paper: http://arxiv.org/abs/2301.13622

#AI #CV #NewPaper #DeepLearning #MachineLearning

<<Find this useful? Please boost so that others can benefit too 🙂>>
Data representation zt in a UNe…
0
1
0

Fahim Farook

"NP-Match: Towards a New Probabilistic Model for Semi-Supervised Learning. (arXiv:2301.13569v1 [cs.CV])" — Adapting neural processes (NPs) for semi-supervised image classification tasks to arrive at a solution with much less computational overhead, which can save time at both the training and the testing phases.

Paper: http://arxiv.org/abs/2301.13569
Code: https://github.com/jianf-wang/np-match

#AI #CV #NewPaper #DeepLearning #MachineLearning

<<Find this useful? Please boost so that others can benefit too 🙂>>
Overview of NP-Match: it contai…
0
1
1

Fahim Farook

"Domain-Generalizable Multiple-Domain Clustering. (arXiv:2301.13530v1 [cs.LG])" — Given unlabeled samples from multiple source domains, an attempt to learn a shared classifier that assigns the examples to various clusters by using the classifier for predicting cluster assignments in a previously unseen domain.

Paper: http://arxiv.org/abs/2301.13530

#AI #CV #NewPaper #DeepLearning #MachineLearning

<<Find this useful? Please boost so that others can benefit too 🙂>>
Problem statement - given unlab…
0
1
0

Fahim Farook

"Fourier Sensitivity and Regularization of Computer Vision Models. (arXiv:2301.13514v1 [cs.CV])" — A study of the frequency sensitivity characteristics of deep neural networks using a principled approach due to recent work showing that deep neural networks latch on to the Fourier statistics of training data and show increased sensitivity to Fourier-basis directions in the input.

Paper: http://arxiv.org/abs/2301.13514

#AI #CV #NewPaper #DeepLearning #MachineLearning

<<Find this useful? Please boost so that others can benefit too 🙂>>
Computing Fourier-sensitivity. …
0
2
1

Fahim Farook

"Conversational Automated Program Repair" — A method to help developers automatically generate patches for bugs using Large Language Models (LLMs) using a conversational approach for patch generation and validation.

Paper: https://arxiv.org/abs/2301.13246

#AI #NewPaper #MachineLearning #SoftwareEngineering

<<Find this useful? Please boost so that others can benefit too 🙂>>
Overview of conversational APR …
0
0
1
Show older