What happens when you merge ChatGPT, Resemble AI, Studio D-ID and Midjourney?
I call it an AI selfie.
As the head of Lippincott’s Experience, Innovation and Engineering practice, I’m obsessed with the intersection of creativity and technology—a passion I get to put to work every day. In an effort to push the boundaries of AI and see what’s possible, I decided to experiment.
Here’s what happened when I put ChatGTP, Resemble AI, Studio D-ID and Midjourney to the test…
So how did I do it?
First, I needed to generate the script, using ChatGPT.
ChatGPT input: “Write a script introducing me in an intro video to be posted on LinkedIn. In it I would like to tell everyone that I created the entire script, image of myself, the video and presentation using only AI. It is an experiment to see just how quickly I could create and share it with my colleagues using the power of generative tools.”
My goal in writing the above prompt was to validate something I’ve been curious about: can the tool generate a script, and give me suggestions on what tools to use or even steps to take within the script itself? While I already had some tools in mind to use, it would have been interesting if ChatGPT made some suggestions or introduced ideas I was’t thinking of.
ChatGPT generated this in response: “Hello and welcome to my LinkedIn profile! My name is [Your Name] and I am excited to share with you an experiment I recently conducted. I used AI to generate the entire script, image of myself, the video, and presentation you are viewing right now. I wanted to show my colleagues the power of using generative tools and just how quickly we can create professional-quality content. I hope you enjoy watching and please feel free to reach out to me for any questions or to discuss the use of AI in your own projects. Thank you for your time!”
Clearly, this wasn’t going to cut it, so I extended the prompt.
ChatGPT input: “Make it longer and funnier.”
I was curious if ChatGPT would tether a simple non-contextual response from to the initial request. Indeed it did:
ChatGPT generated this in response: “Welcome to my LinkedIn profile, my fellow digital natives! My name is [Your Name], and I am here to share with you an experiment that I like to call ‘AI-Selfie’. You see, I decided to push the boundaries of what is possible with AI and put it to the test. I wanted to see if I could create the ultimate self-promotion package using only AI-generated content. So, I sat down with my trusty computer, opened up my favorite AI tools, and got to work.
First, I used natural language processing to write this script you’re reading right now. It was a breeze, I just had to tell the AI what kind of tone I wanted and BOOM! Script written, no writer’s block to be seen.
Next, I used a generative model to create an image of myself. I know, I know, you’re thinking ‘But [Your Name], you’re already a good-looking individual, why would you need AI to make you look even better?’ Well, let me tell you, the AI version of me is like the photo-shopped version of me, but better! It’s like the AI version of me is the superhero version of me.
After that, I used another generative model to create a video of myself giving this introduction. I was able to choose my virtual background, my virtual outfit, and even my virtual facial expressions. It was like playing a video game, but instead of killing zombies, I was killing the concept of having to film oneself.
Finally, I used AI to design the presentation that you are currently viewing. The AI took care of the layout, colors, and even the typography. I must say, the AI did a better job than I ever could have.
So, there you have it, my experiment to see just how quickly and easily I could create professional-quality content using only AI. I hope you enjoyed watching, and if you’re feeling inspired, you can try this yourself. And if you’re not feeling inspired, well, at least you got to see my virtual self, which is almost as good as the real thing. And always remember, the future is here and it’s AI-tastic!”
As you can see, if you compare the generated script to the final product, it’s about 97% of what was generated. Later in my process, I made a few changes from an inflection, tone and timing perspective, but all in all, I’d say I stayed pretty faithful to ChatGPT’s generated story.
Next, I used Midjourney to generate an image of myself for the video.
I usually use Midjourney on my mobile device through the Discord app. However, since I was working on my Mac, I installed Discord on it instead and used it there. I found the desktop version easier to find and upload profile photos of myself into it (so that I could train Midjourney to recognize my likeness) and uploaded 3 pictures into MidJourney.
After I uploaded the photo into Midjourney, I right-clicked on the photos, copied its unique URL, and then referred to it in a prompt like this:
MidJourney input: “Stylize like a cartoon presentation of a professional creative director”
I tried many prompts before getting the result you see in the post. I used prompts that included “Pixar”, “Yellowstone”, “Safari”, “Accomplished”, (lol) “Happy”, “Confident”, “Cartoon“, “Presenter“, “Keynote“, “Toy collector” (I mean, I am, #sorrynotsorry), and even “Villain” among many of the prompts, yielding mostly hilarious outputs. I won’t bore you with all the prompts, and outputs but here are just a few you can make fun of me over:
Narrowing to the above version was tough. I had to work hard not to perseverate over the little oddities in many of the otherwise great results. For example, the logo on the t-shirt not being fully “on” the t-shirt (see how the flannel has this Escher-esque moment on the left of the logo?), as well as the strange brand on my hat?
However, I decided I loved whatever was trying to happen on the t-shirt and deemed it wonderfully creepy, adorable and oddly relevant to my sense of humor, so I went with it. I was also really into the uncanny resemblance to my home office as well as my attire in general. I often layer clothes like how the illustration depicts, and it’s pretty interesting that a few keywords can lead to that. I guess we’re all tropes of something, no matter how original we want to be?
**stops typing and contemplates life…**
Next, I moved on to generating my voice using Resemble AI.
In Resemble AI, I generated a voice print of my own. This step was eye-opening. Resemble allows you to either upload an audio file of you speaking or you can use its interface to read aloud nonsense prompts that are written to capture units of sound from the way you speak. By hearing you speak, the tool tries to distinguish the way you would articulate one word or another in the conversation. I opted to upload about 10 sentences of me speaking and ended up adding about 25 more as the first output wasn’t nearly close enough to how I sound. The more sentences you upload, the more the tool can identify the atomic parts of speech that are completely unique to you. I didn’t push the tool further, but my sense is that if I keep going back to the audio signature and record more speech, the output would become better and better. I wanted to generate everything as fast as possible for my test, though, so I accepted the second version and moved on.
Just think of this tool as: “The quick brown fox jumps over the lazy dog” for sound. As a designer, I often use this sentence to examine type because it contains every letter in the US alphabet, where letters are the most atomic components. Speech now has analogous tools with things like Assemble.ai.
One thing to note here: I thought the degree of control they give you over controlling speech pace, pauses, emphasis, etc. was really cool.
It is worth noting here that I also entertained using Descript for this. Descript is a powerful way to use your ‘spoken word’ as a content-producing technique as well as an editing device. It’s incredibly robust. Ultimately, I deemed it too robust for my quick little test, however, stay tuned for an interesting experiment using only Descript coming soon.
Once I had my audio how I liked it, I downloaded it as a compiled audio file.
Finally, I used D-ID to generate the video.
This was really the simplest of all the steps. Using D-ID I was able to quickly add my “selfie” from MidJourney (I also added a few others just to see how it would animate those) and upload the audio from Resemble AI. The tool combined them as a video output.
Once both the image from MidJourney and audio was uploaded, I clicked generate and D-ID rendered the video you saw in my LinkedIn post. Shockingly – once I had all my steps planned out, this unbelievably took minutes to do.
So, there you have it. A step-by-step guide to my experiment with AI. These tools are incredible, and in the coming year, I’m excited to work with my colleagues to really dig into what the implications of each are for brands across industries.
As I was putting the finishing touches on this post, I thought it would be interesting to use an AI-generated masthead for it. Could ChatGPT generate a more descriptive prompt for MidJourney that would result in better images? As you can see above, after a few tries at this, MJ wasn’t getting it. It was understanding “prompt” to mean something of a “snippet” about my post. I was frustrated. Then I wondered, “What If I trained MidJourney as part of the prompt to make it more self-aware and contextually aware?“
Here’s what I entered into ChatGPT: “I want you to act as a prompt generator for Midjourney’s AI. Your job is to describe a visual scene of artificial intelligence as concisely as possible depicting a whimsical world where AI and humans exist in harmony. Give me 3 different and contrasting versions of this scene and include art style or artist’s name prefixed by “in the style of”. In the end ask the me if I will end the current task or continue to another different prompt.”
Holy smokes it worked. Look at what I got in return.
I used one of them to generate the masthead image for this post. Can you guess which one?