A team of researchers at USC is helping artificial intelligence to do something that humans have always had an easier time with: imagining the unseen.

The design technique could lead to fairer A.I., new medicines, and increased autonomous vehicle safety.

The artificial-intelligence system, developed by computer science professor Laurent Itti and PhD students Yunhao Ge, Sami Abu-El-Haija, and Gan Xin, in effect, uses the attributes that it "knows" to then think up a never-before-seen object.

Humans can separate their learned knowledge by attributes — for instance, shape, pose, position, color — and then recombine those factors to imagine a new object.

The team was inspired by a human's visual generalization capabilities. The USC researchers wanted to simulate that kind of human imagination in machines, said Ge, the study’s lead author.

"After humans see images of red boats and blue cars, they can decompose and recombine the learned knowledge to imagine novel images of red cars," Ge told Tech Briefs in the short Q&A below.

The USC team simulated this same process using neural networks.

The "imaginative" system outputs an idea that's a combination of learned knowledge. A simplistic example, shown in the above image, demonstrates how the machine — trained on five separate letters, each with their own color and background — yielded an idea that combines all of the data: a brown lowercase "g" on gold background.

The paper, titled Zero-Shot Synthesis with Group-Supervised Learning  , was published in the 2021 International Conference on Learning Representations on May 7 of this year.

How to 'Disentangle' a Machine-Vision Snag

Machines are most commonly trained on sample features, like pixels, without considering the object’s attributes.

In the new study, the USC researchers attempted to overcome this limitation using a concept called disentanglement. The approach takes a group of sample images — rather than one sample at a time as traditional algorithms have done — and mines the similarity between them to achieve an idea called “controllable disentangled representation learning.”

Next, the knowledge is recombined to achieve “controllable novel image synthesis,” or what you might call imagination. It's a bit like robots on the big screen.

“Take the Transformers movie” said Ge in an earlier press release  . “It can take the shape of Megatron car, the color and pose of a yellow Bumblebee car, and the background of New York’s Times Square. The result will be a Bumblebee-colored Megatron car driving in Times Square, even if this sample was not witnessed during the training session.”

In the interview with Tech Briefs below, Ge explains how disentanglement widens the opportunity for applications.

Tech Briefs: Traditional A.I. is trained on samples and image data, right? How do you train a machine to be “imaginative?” Does a system have to be trained on “learned” components or attributes that can be swapped in and out?

Yunhao Ge: To train a machine to be “imaginative,” we have an assumption that humans can “factorize” the learned knowledge and freely “combine” them to imagine a new unseen scenario for “imagination” For example, after humans see images of red boats and blue cars, they can decompose and recombine the learned knowledge to imagine novel images of red cars.

Based on this assumption, we propose a new learning paradigm, Group-Supervised Learning, which takes a group of samples as input and learns the similarity among them. Controllable disentangled representation learning simulates humans’ “knowledge factorization and recombination ability” and achieves zero-shot synthesis, which simulates the “imagination.”

In our paper, samples with attributes labels are examples of basic elements used to synthesize new unseen scenarios. In different tasks, the meaning of attributes may change. “Swap in and out” is one way of trying to simulate the recombination ability; you can use different ways to achieve this simulation under our group supervised learning paradigm.

Tech Briefs: What other specific applications do you envision for this kind of system? Which applications call for the most “imagination?”

Yunhao Ge:

  1. Discover new drugs. To combine some learned functions from existing drugs together and synthesize or discover new drugs with desired functions.
  2. For fairness decisions, based on our controllable disentangled ability, we can factorize the undesired factors out and avoid the system consider them during decision making. For instance, race and gender should not be considered in some decisions to ensure fairness. Our group-supervised learning can first disentangle the race and gender information and only use the remaining information during a decision.
  3. Using our method as data augmentation method to create new data by imagination.

Tech Briefs: In what ways can an imaginative A.I. help a self-driving car?

Yunhao Ge: We can use the learned experience to synthesize or imagine some extreme or dangerous situation, which can teach the self-driving system to avoid this situation and help to improve the robustness and safety.

Similar to the fairness problem, we do not want the self-driving system to consider some factors during the decision. We can use the controllable disentangled representation learning ability to disentangle the useless factors and delete them during the decision to help eliminate the decision bias, which is helpful for safety.

Tech Briefs: What safeguards this system from developing an imaginative, but dangerous design idea?

Yunhao Ge: This learning paradigm is controlled by the user; the designer should be virtuous.

Tech Briefs: What are you working on next?

Yunhao Ge: We want to make our method more general, which releases the requirements of dataset and applications. We also want to extend our method into different data modalities and tasks.

What do you think about "imaginative" A.I.? Share your questions and comments.



Transcript

00:00:00 hello everyone i'm yung hong ko from uic today i will present our work zero short sentences with group supervised learning this work was done with sami kanjin and lauren etie let's start with motivational vision cognition human can envision a novel visual object even they never seen before we want to empower our machine a similar

00:00:24 ability zero short synthesis knowledge factorization may help humans envision for example after we see two images we can factorize the attributes like the shape and color and then we combine the knowledge to envision a new object for machine can we use neural network to simulate

00:00:44 this process yes we call it controllable disentangled representation learning which can be achieved by our new learning framework group supervised learning group supervised learning allows us to decompose inputs into a disentangled representation with slowable components that can be

00:01:04 recombined to synthesis new samples well for this integral representation please achieve this entangled implicitly but they cannot control the disentanglement with no attribute labels stargand and elegant can control attributes in synthesis but hard to maintain global consistency

00:01:28 our methods solve these problems and allow easy implementation and stable training let's move on to the problem statement and approach the data set where each image has attribute labels we encoded them as a multi-graph where edges indicate shared attribute class each pair of nodes

00:01:52 shares zero or more attribute values our goal is to synthesize novel image for arbitrary query you can see the top image is the combination of the attributes from the bottom images given a group of images represented as multi-graph we start from a feed forward auto encoder to achieve controllable

00:02:16 disentanglement we first predefine this entangled partition in the latent space for example identity information stored in red poles in yellow and background in green and then achieve it with swol attribute swap for example we sample two images with same identity

00:02:36 and get the latent codes then we swap the shared id related latent dimensions and form new latent codes and generated images because they have same id sort of id related dimensions should not change the images similarly for two images with same pose if we swap the pose related dimensions

00:02:59 the image should keep unmodified we also saw background importantly swapping all of the attributes forces the new network to actually disentangle them instead of finding some cheat v besides that for two images shared no attributes we randomly swap one attribute

00:03:20 and generate two new images with no ground truth then we swap the same attribute again and recover [Music] the original input images this cycle attribute swap implicitly enforce the disentanglement during training we use only reconstruction loss with easy implementation

00:03:43 and stable training let's see the experiments and results on lab20 meaning dataset we want to combine the attributes from the provider to synthesis new images our method can satisfy the query and can also maintain the semantical consistency for example when rotating the main object the

00:04:06 background also rotates with it here are results on fonts dataset to this integral 5 attributes here are the results on fvsd with three attributes to disentangle for this entanglement analysis we calculate a model-based confusion matrix between attributes to use the

00:04:27 corresponding attribute to predict attribute labels our methods have high accuracy on the diagonal with no with low accuracy of the diagonal we also show our method can be used as data augmentation and help downstream classification tasks we also publish a new dataset fonts welcome to use it to fast testing

00:04:50 and ideal integration on disintegration learning and zeros of synthesis please visit our paper on website for more details thanks