I'm a bit of an eclectic mess 🙂 I've been a programmer, journalist, editor, TV producer, and a few other things.
I'm currently working on my second novel which is complete, but is in the edit stage. I wrote my first novel over 20 years ago but then didn't write much till now.
Yesterday's Pratchett novel title was: "The Shepherd's Crown"
And that's the last of the #DiscWorld titles 😞 Sure there are a few others left like "Nation", "Dodger", and the "Bromeliad" stuff (not to mention "Johnny") but those don't really count as much here. I was reluctant to do this one since it feels (almost) like reading the last DiscWorld novel (and I haven't read anything much since then ...)
"Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation. (arXiv:2206.07771v2 [cs.CV] UPDATED)" — Synthesis of multiple types of content such as dance-to-music or text-to-image using a new diffusion mechanism, at fewer steps.
"Write and Paint: Generative Vision-Language Models are Unified Modal Learners. (arXiv:2206.07699v2 [cs.CV] UPDATED)" — A unified model based on training a model to write and paint concurrently.
"Text-driven Visual Synthesis with Latent Diffusion Prior. (arXiv:2302.08510v1 [cs.CV])" — Using diffusion models as the generic driver for diverse image generation tasks such as text-to3D, image editing, and StyleGAN adaptation.
"3D-aware Conditional Image Synthesis. (arXiv:2302.08509v1 [cs.CV])" — Using a 2 input such as a segmentation or edge map to generate photo-realistic images from different perspectives/viewpoints.
"T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models. (arXiv:2302.08453v1 [cs.CV])" — Controlling text-to-image diffusion models in a more granular fashion by using special adapters to provide extra guidance.
"MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation. (arXiv:2302.08113v1 [cs.CV])" — Controlling diffusion-based image generation so that you can specify image components, component placement etc. without any further fine-tuning.
"\`A-la-carte Prompt Tuning (APT): Combining Distinct Data Via Composable Prompting. (arXiv:2302.07994v1 [cs.LG])" — Having multiple subsets of data trained on specific prompts and being able to compose the final model based on the prompts you select.
"PRedItOR: Text Guided Image Editing with Diffusion Prior. (arXiv:2302.07979v1 [cs.CV])" — Structure preserving, text guided image editing using diffusion models without needing a base prompt, fine-tuning of models etc.
Yesterday's Pratchett novel title prompt was: "Raising Steam".
Here's the thing about the prompt — I generated images on macOS initially and I was happy with the images I was getting since I was getting strange stuff. Nothing really to do with the prompt possibly, but all sorts of weird and wonderful landscapes 🙂
Then I switched to Windows for generation and suddenly all I'd get were trains or some sort of steam engine. Not a lot of variety ... No matter how many models I tried 😛
I've selected a mixed set from both sides for fair representation but I feel as if this needs more exploration ...
"Forward Pass: On the Security Implications of Email Forwarding Mechanism and Policy" — How email forwarding can create security vulnerabilities and and allow spoofing.
"AI Chat Assistants can Improve Conversations about Divisive Topics" — A study looking at how Large Language Models can improve conversations on divisive topics by making the participants feel understood.
"CiteSee: Augmenting Citations in Scientific Papers with Persistent and Personalized Historical Context" — A tool that uses a reader’s publishing, reading, and saving history to provide personalised context to citations in papers that they’re reading.
"Stitchable Neural Networks. (arXiv:2302.06586v2 [cs.LG] UPDATED)" — A way to combine different pretrained models to combine models of varying complexity and performance.
"Learning When to Say "I Don't Know". (arXiv:2209.04944v2 [cs.CV] UPDATED)" — A method to teach learning systems when they don't know something, or at least to identify areas of uncertainty. Perhaps this should be tried with ChatGPT and Bing to mitigate all the gaslighting? 😛
"SoK: Anti-Facial Recognition Technology. (arXiv:2112.04558v2 [cs.CR] UPDATED)" — An analysis of the currently available Anti-Facial Recognition (AFR) research and the pros and cons of the different approaches.
"Team Triple-Check at Factify 2: Parameter-Efficient Large Foundation Models with Feature Representations for Multi-Modal Fact Verification. (arXiv:2302.07740v1 [cs.CL])" — Verifying facts across different modes (text and images) and types (claim and document).
"Video Probabilistic Diffusion Models in Projected Latent Space. (arXiv:2302.07685v1 [cs.CV])" — Generating high-resolution and coherent video using diffusion models.