The Prompt Craftsman
Posts
21st issue: Speak to me like a human being 🤖

21st issue: Speak to me like a human being 🤖

RT-2, Audiocraft and more

Minjie Shi
August 06, 2023

🌯 Wrap for the week

A close look at all the main headlines in AI

This is what Robotics combined with GPT looks like.

Technically speaking this is the news from the week before, however, I think this is still quite an interesting and important piece of news to share.

Google's DeepMind has unveiled the Robotics Transformer 2 (RT-2), a revolutionary vision-language-action (VLA) model that equips robots to carry out tasks they haven't specifically trained for.

Credit: Google DeepMind Research Blog

Learning from the Web: RT-2, unlike its predecessor RT-1, gleans general concepts and ideas from text and images found across the web, instead of relying solely on hefty amounts of specific robotic data. This paves the way for robots that are context-aware and adaptable, allowing them to operate in various environments and situations with significantly less training.
Taking the Leap Beyond Training: In what seems like a page out of a sci-fi novel, RT-2 is proficient at interpreting new commands and carrying out tasks it's never seen before. For example, rather than being specifically trained to identify and discard trash, RT-2 already knows what trash is and how to dispose of it based on web data. The results are impressive, with RT-2 almost doubling the performance of RT-1 in tackling novel, unseen scenarios.
A Future Full of Context-Aware Robots: The advancements made by DeepMind suggest a future where robots can reason, interpret information, and perform a variety of tasks based on a specific context. Picture warehouse robots that handle each object uniquely, accounting for type, weight, fragility, and other factors. With the AI-driven robotics market forecasted to expand from $6.9 billion in 2021 to $35.3 billion in 2026, this technology might just be the springboard the industry needs.

If you are interested to find further details about RT-2, check the research blog from Google DeepMind about it: https://www.deepmind.com/blog/rt-2-new-model-translates-vision-and-language-into-action

Main news credit:
DeepMind unveils RT-2, a new AI that makes robots smarter from VentureBeat

Meta continues its development of AI at a fast pace

There are a bunch of development from Meta this week in AI, let’s check them out:

AudioCraft

Meta has open-sourced AudioCraft, a novel AI toolkit designed to generate music and audio effects from text prompts.

The toolkit includes

AudioGen for sound effects
MusicGen for melodies
EnCodec for advanced audio compression codec.

While these AI-driven creations may not replace professional audio production yet, they mark a significant advancement in the less-explored field of generative audio models. Despite challenges, Meta hopes the release will encourage community-wide experimentation, paving the way for more advanced and accessible generative audio tools. The MusicGen model was trained ethically using music owned or licensed by Meta, addressing recent controversies over generative AI.

As I have touched upon the MusicGen stuff, check the AudioGen examples from its paper: https://felixkreuk.github.io/audiogen/

If you are keen to use AudioCraft, check out the GitHub space here: https://github.com/facebookresearch/audiocraft

AI-powered Chatbot with different personas

Meta is reportedly set to release AI-powered chatbots with different personas, which could debut as soon as next month on platforms including Facebook and Instagram.

These chatbots, with diverse personas such as a surfer or Abraham Lincoln, are aimed at increasing user engagement by offering new ways to search the platform and receive recommendations. The technology could potentially gather more user data, raising privacy concerns.

This move follows Snapchat's similar integration of an AI chatbot earlier this year. The development aligns with Meta's stated goal of building new AI-powered products, with more details anticipated at its Connect developer event in September.

I guess Meta has seen the incredible growth from character.ai and just wants to copy follow this idea.

Main news credit:
Meta releases open source AI audio tools, AudioCraft from Ars Technica
Meta is reportedly preparing to release AI-powered chatbots with different personas from TechCrunch

Other top AI headlines for the week

OpenAI Drops Huge Clue About GPT-5 from Futurism

Still no particular date in mind regarding when GPT-5 is coming, but the patent filing does make it feel like it is coming soon.

OpenAI has filed a trademark application for:
“GPT-5”
which includes “software for”:
“the artificial production of human speech and text”
“conversion of audio data files into text”
"voice and speech recognition"
"machine-learning based language and speech processing"
👀
— YK aka CS Dojo 📺🐦 (@ykdojo)
2:07 AM • Aug 1, 2023

Google's 'mind-reading' AI can tell what music you listened to based on your brain signals from LiveScience

I can’t think of any particular use cases for this yet but the technology does sound really cool.

Instagram reportedly developing new AI features, including an AI-generated image detector from ZDNet

I still think labelling the image could be helpful for most people given not all of us are aware of how to distinguish them nowadays. However, there is always the concern that how companies or organise could use the labelling for their own purpose and gain. What do you think should be the way to go?

🎬️ Guess that movie

Adding a bit of fun to your weekend reading, let’s guess the famous movie from the 10-emoji description created by ChatGPT:

👮‍♂️🤖🔬��🚘💥��️🏙️👊⚠️

If it is a bit hard to figure out, you can find the answer with some explanation from ChatGPT at the end of this newsletter.

🧰 Toolbox

Check out this week’s tools and resources that give you more superpowers with prompt crafting!

1. 📚️ The state of AI in 2023: Generative AI’s breakout year
This is an interesting report from Mckinsey. More than 1600 people have been surveyed for this report and the data shows how AI is being adapted at the moment.

2. 🔧 Spell
If you want to create your own intelligent stock analyst to keep an eye on any stock of your choice, this is the tool to make that happen.

3. 🔧 Quivr
I heard about this tool from a meetup a couple of weeks ago and recently saw it surface in the AI circle again. It basically works like your second brain so that you can dump information there and retrieve what you need via chat. For people dealing with a lot of information, basically everyone, this tool would be quite useful. Plus, this is open source as well.

👋 Your next XTwitter friend

Connect with great people in the industry to supercharge your growth

Are you using ChatGPT to learn a subject?
Great.
But you can do better.
Today's mission: Create a Smart Studdy Buddy using @LangChainAI & @str@streamlitet me show you how (code included) 🧵
#AI #AIp
— Joris de Jong (@JorisTechTalk)
11:54 AM • Aug 4, 2023

🖼️ Image of the week

LK99 has undoubtedly taken over the world by storm this week!

ngl #LK99 has got me in a chokehold
— christina (@cszhu)
5:08 PM • Aug 1, 2023

🎬️ Guess that movie - answer

The answer is "I, Robot": 🥳

Did you guess that right? Here is ChatGPT’s explanation:

👮‍♂️🤖🔬��🚘💥��️🏙️👊⚠️

This summary captures the main elements of the story: a detective (👮‍♂️), robots (🤖), science and innovation (🔬), the Three Laws of Robotics (📚), a self-driving car (🚘), explosions and action (💥), the unique robot with a different 'eye' (👁️), the futuristic city (🏙️), physical confrontations (👊), and danger or threat (⚠️).

That is it for the week!

Until I see you next time, stay awesome my friend!

Cheers,
Minjie

Reply

or to participate.