Sayan Mondal’s photostory: ‘A picture is worth a thousand words’, but now ‘you only need a few words’ to create one

Artificial intelligence (AI) has made remarkable progress in recent years and has started to penetrate many areas of our lives, including the world of art, apart from infamous ‘AI Camera’ advertised by many smartphone manufacturers. Recently, due to advances in research of deep neural networks in the field of machine learning models, new possibilities arise for creating and generating unique visual content that challenges our traditional understanding of art. As a traditional photographer, I have been intrigued by the intersection of artificial intelligence and art. In mid-2022, when first generation of these algorithms came alive, I began experimenting with these models to create AI-generated images and videos. The experience has been eye-opening, and I am excited to share my insights on the current status of AI art based on my experience with it.

Text-to-image models are a fascinating field in the realm of artificial intelligence that take a natural language description, known as “prompt” as input and generate an image that matches that description. These models typically consist of a language model that transforms the input text into a latent representation, and a generative image model that produces an image conditioned on that representation. Stable Diffusion, developed by StabilityAI, is an opensource and most versatile text-to-image model that has garnered attention for its ability to generate high-quality images. It uses a diffusion process to generate images from a continuous noise input, making it ideal for tasks such as image synthesis and in-painting. In the design, visual arts and augmented reality field, stable diffusion-based images have started to be used to create realistic and visually appealing images for marketing, advertising, and branding purposes.

Generative AI is a broader category of AI models that includes models such as Generative adversarial network (GANs) and others. Unlike text-to-image models, these models can generate images from other images or noise without the need for textual prompts. They consist of two neural networks, a generator and a discriminator, where the generator creates new images and the discriminator evaluates whether they are real or fake. The generator is then trained to create images that fool the discriminator into thinking they are real. Nvidia's StyleGAN is one of the most famous GANs and is used for stock images for video game creation.

Google Brain's Imagen is a text-to-image model that combines a transformer-based language model with a StyleGAN2-based generative model. It can generate high-quality images, including animations and 3D models, from textual prompts. Compared to other text-to-image models, Imagen has the additional capability to generate animations and 3D models, making it a versatile model for various applications. Its algorithm combines semantic understanding of the input text with the visual quality of the generative model to produce high-quality images. Imagen can be used for a variety of applications, such as product design, movie production, and scientific visualization. OpenAI's DALL-E, which was introduced in 2021, is another well-known text-to-image models. DALL-E takes textual prompts as input and generates images that match the description. It uses a transformer-based language model and a GAN-based generative model to generate high-quality images. DALL-E has been used to generate a wide range of images, including surreal creatures and everyday objects. DALL-E generated images can be used in educational materials, such as textbooks and e-learning modules, to create custom illustrations and diagrams that help to explain complex concepts.

Then comes famous Midjourney which is a commercial AI art generator introduced in mid-2022. Unlike traditional text-to-image models that require detailed descriptions, Midjourney uses an autonomous bot to interpret the text input as a concept and brings it to life with a touch of magic and surrealism in its unique Midjourney-style. Although it is speculated that Midjourney is based on Stable Diffusion technology, this has never been disclosed. Midjourney's unique art style has been popular among commoner, and new-age art creators, contributing to its quick success. 


Text-to-image models and generative AI models have revolutionised the way images are created and synthesised. These models have enabled applications such as creating illustrations for children's books, generating visualisations for scientific papers, and designing products and environments, generating innovative ideas for a fashion and interior designers. With the continued development of AI technology, it is likely that these models will become even more advanced and accessible, leading to new and exciting applications in the future.

From a legal perspective, AI-generated images have been a subject concern, as they raise questions about ownership and copyright infringement. In the USA, A class action lawsuit was filed against Stable Diffusion and Midjourney, and DeviantArt – all tools that generate images based on text input using AI model. The lawsuit alleges that the algorithms used in these tools violate the intellectual property rights of several photographers. It is claimed that the tools were trained on copyrighted photographs without permission. This lawsuit is one of the first to focus on AI-generated art, raising important legal questions about the status of such works. The outcome of the case could have significant implications for the future of AI-generated art and the use of copyrighted material in AI algorithm development.

On other side of human creativity, the increasing sophistication of AI-generated images raises a fundamental question: how do we define the value and authenticity of a photograph? While a photograph captures a moment in time through the exposure of film or a sensor to light, a photorealistic AI-generated image can create a convincing visual representation of reality that challenges our perception of what is real. While a real canvas painting captures the unique expression of the artist's vision and skill, an AI-generated painting can create a photorealistic representation of reality that is virtually indistinguishable from a human-created work. As AI technology continues to advance, the line between what is captured by a camera or painted by a human hand and what is created by an algorithm are becoming increasingly blurred, leading to a re-evaluation of the role of traditional art forms in society.

Despite concerns about copyright infringement and the devaluation of human creativity, the future of AI-generated art is full of exciting possibilities. The use of AI technology in art has the potential to create innovative and unique artworks that challenge our traditional ideas of what constitutes art. Artists can use algorithms to generate new and diverse forms of art, such as paintings, sculptures, and music, that were previously impossible to create. However, the growing prevalence of AI-generated art raises important questions about the role of human creativity in the creative process and the value of art in society. Some argue that AI-generated art can democratise art by making it more accessible, while others fear that it may commodify and devalue human creativity. The future of AI-generated art will depend on how we balance these competing perspectives and how we define the role of human creativity in shaping our cultural identity. 

Sayan is a biophysicist by training and has deep interest in computational creative processes using AI. He is also an avid bird photographer and a budding entrepreneur. He loves playing badminton and hanging out with close friends.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s