The USC Shoah Foundation fuses a mixture of technologies to let you have a conversation with a Holocaust survivor.
Having a one on one conversation with someone remains one of the most immersive experiences a human being can share, and when watching people speak to Holocaust survivor Pinchas Gutter, you feel that intensity.
Gutter, who was born in Lodz, Poland in 1932, wears a dark vest buttoned up over a white dress shirt. Sitting on a chair, with hands resting on knees, he stares straight ahead while an audience asks questions about his life before, during and after the Holocaust.
“Do you remember any songs from your youth?” one member of the small crowd asks.
With barely any pause, Gutter responds by singing a Polish lullaby his mother use to sing to him. He does this with such gusto, the people in front of him listening intently, that it’s a marvel how an 83-year-old could emerge from the horrors of the Warsaw Ghetto and Majdanek death camp with an absolutely infectious joie de vivre. What’s even more amazing, is that Gutter isn’t even here, and his responses come from a recorded image that’s projected onto an 81-inch screen.
A project produced by the USC Institute for Creative Technologies and the USC Shoah Foundation—the world’s largest visual testimony of Holocaust and genocide survivors—combines language processing software, voice recognition technology and visualization, allowing people to interact with a Holocaust survivor in real time. Called New Dimensions in Testimony, the California-based nonprofit hopes to make 3D interactive exhibits publically available, where visitors can continue to ask questions of survivors for generations to come. (Click here to check out our documentary on the past, present, and future of 3D scanning.)
The Creators Project sat down with the director of the USC Shoah Foundation, Stephen Smith, to talk about how to explode the act of conversation using new technology.
The Creators Project: The USC Shoah Foundation is a visual history archive. What does that mean?
Stephen Smith: It’s an amazing archive including 53,000 video histories of the Holocaust and genocides from around the world over the last 100 years. It’s 112,000 hours of material, which is 12.8 running years, taking in 63 countries in 40 languages. The material is broken down into 1-minute segments and all of those segments have been indexed, to be searchable by topic or interests.
Who’s your audience?
Principally scholars, educators, organizations and communities—communities being individuals who feel connected to the Holocaust or genocide. That’s the closest thing we have to a public audience.
And you’ve been looking at how to present this material in new ways.
We’re sharing a new project called New Dimensions in Testimony, which has been in the works since about 2010. Initially, the idea was out of Heather Maio, who’s an exhibition designer and concept developer. She wanted to create an interactive exhibition by which you could talk to Holocaust survivors.
So what does this interactivity mean? This is being able to speak to a projected image in real time?
We focused on making this a content first project, and later on, figuring out how we wanted to share or visualize that. So we started by asking basic questions like, How many questions do you ask? Do you really want it to be conversation? Does it have to be voice activated? Should it be a touch screen? How do we know that it’s going to enrich and give dignity to the experience?
So you’re using a few different technologies here—voice recognition, language processing and an image capture. What’s the image capture like?
You have multiple things that you could do with visualization. What we had on display at Sheffield Doc/Fest is a HERO camera shooting in 2D. It’s presented on an 81-inch LCD life-size screen. There’s also the voice recognition.
And so Pinchas Gutter, for example, is on a screen. I ask a question into a microphone, and his image on the screen responds.
That’s kind of the most basic life-size version of the interactive, which we would anticipate in using.
Wow. How else could you present it?
Another version is using something like a Pepper’s ghost, where you’ve got the image projected into a 3D space. Since the space is 3D you don’t realise that the image, which is on the Pepper's ghost screen, is actually a 2D image. The eye can’t discern the 2D image in the 3D environment, so you could look at Pinchas seated in a library, and from where you’re seated, it looks like your looking into a library.
They use those techniques in haunted houses. Is anyone currently presenting the content in this format?
We’ve yet to see how museums are going to use it. One of our partners, the Illinois Holocaust Museum, is designing one now.
What else has been developed for the project visually?
Alternatively, we have developed a prototype of what they call a multi-schopic array, which allows for a ray of projectors to project images from an array of cameras. This means that you see the same type of image on the screen that you would see when wearing 3D glasses.
But without any glasses.
We also throw the image into a translucent screen that’s on floor level so when you walk into the room it looks like he’s sitting on a chair. It’s in 3D and you can move around an axis with him. That’s in prototype and not installed anywhere yet.
Why no VR headset?
We’re not rushing it into the VR space but it is VR compliant. I think one does get a sense of place in VR but you don’t get a sense of social learning. It’s a very lonely experience.
We’re interested in the sort of experience where people are sitting gathered around and learning together and asking questions.
Which is really important for a subject like this.
There’s nothing that replaces the conversation between two human beings. What’s given us some satisfaction, is that the subjects themselves are satisfied with how they’re being represented. It’s a way that they would like to be represented, in conversation as themselves. This has been really important to us and we’ve been really careful not to put them in a situation where the voice recognition is breaking all the time and, therefore, making them look inadequate. Balancing all those issues has been challenging and rewarding.
How did you know what questions to ask?
In 2014, we filmed our pilot with Pinchas. We filmed for about a week, 5 hours a day using 50 HD cameras in a 180-degree arc. In this time, we asked him 1,000 questions in a 20-foot dome inside a light stage of 6,000 LED lights.
Ok, wait. What does the dome do?
It means that we can relight the subject.
Light in any given room has a different light temperature, so we can match the light temperature of the video to the light temperature of the room. It means that you can make your video look like it’s in the room from a light perspective.
So you filmed Gutter and then what?
We took those 1,000 questions and put them in a database and with the video put it in a temporary installation at two different places, including the Los Angeles Museum of the Holocaust. This was on a laptop and not voice activated. We asked people to ask Pinchas questions and had people sitting behind the screen listening, searching the database for the most appropriate answer. We called it 'Wizard of Oz' testing.
Why not get a machine to do that?
Our partner in this technology is the Institute for Creative Technologies. They were helping us develop a richer, natural language processing approach, by which the visitor could ask a question and the system understand it, but we had to first know what the range of subject matter was.
What did the testing show you?
It showed us that we hadn’t asked all the right questions. Sometimes it was about the wording of questions, for example, ‘Do you believe in God?’ provided a different answer to the question people were more interested in, ‘Did your faith change as a result of the Holocaust?’
So we brought Pinchas back for filming in August 2014 and filmed for a further 3 days. We did a body scan and a high fidelity facial scan as well.
We don’t know. Some of it was really about making sure we had got all the data that we might conceivable need. In the case of faces, it’s good for transition and morphing between segments. If you're going to do a digital morph, for example, you need to be able to get an absolute accurate and really high definition facial expression, so the high-fidelity scanning was really important for that.
We also wanted to capture the most amount of data that we possibly could because we didn’t know where it was going to ultimately get used. We wanted to be able to have options so it could be used in all those different ways.
We made it platform agnostic—so we we’re not making a video for a thing. We were capturing content that could be rendered in multiple ways in multiple things.