Yao Wu

Paint Your Video in Style

2016-12-12T00:00:00+00:00

Intro

This is the passion project that I did for the Metis bootcamp. It leverages convolutional neural network to perform artistic style transfer on videos.

Motivation

Loving Vincent is the world’s first fully painted animation movie that took a team of 115 artists who painstakingly painted every single one of the 67,000 frames. The size of all these oil paintings combined would cover the entire island of Manhattan. This might be the best way to pay tribute to Van Gogh, however, it is not scalable. Given the recent developments in the computer vision, can we create a more strategic solution?

Style Transfer with Static Images

The idea of tyle transfer is to create a stylized image that combines the style of an artwork with the content of an image. Here are a few examples.

These stylized images capture the color scheme and texture of the painting while preserve the content the picture. There are mobile apps such as prisma and pikazo that popularized style transfer in the last year. It all started with the this paper A Neural Algorithm of Artistic Style, in which it proposes to extract content and style features from a convolutional neural network and create an stylize image by minimizing a loss function.

Feature Extraction

Convolutional Neural Networks perform exceptionally well in image classfication tasks. They consist of layers of neuron that each extract certain features and the output of each layer can be considered as the filtered version of the input image.
If we reconstruct the content of an input image at each layer by visulizing the output, we can see that the network is able to capture more abstract representation as we progress towards higher levels of layers. In this case, it appears that feature maps from layer 4 and 5 would be appropriate content representations.
For style representations, the paper calculates the feature correlations(Gram Matrices) between filter responses. The visulization shows that the correlations match the style in increasing scale as we go through the network. And it uses all layers to represent style.

Loss Function

The loss function consists of two parts: content loss and style loss.
The content loss is the sum of squared element-wise errors between the feature responses at layer 5 of the content image and that of the stylized image. Whereas the style loss is the sum of squared errors between the Gram matrices at all layers.
The two terms need to have appropriate weights in order to perform properly. This step would require some level of trial and error.
The stylized image is found by minimizing the loss function and the optimization is performed using L-BFGS.

Style Transfer with Videos

Style transfer on static images has been widely researched. However, studies on video style transfer are fairly new. Although a video is a collection of staic images, we cannot directly combine individual stylized frames because of randomness in the initialization would cause unpleasant background flickering.

In order to mitigate this artifact, this paper Artistic Style Transfer for Videos proposes a few fixes. Their implementation was coded in LUA with Torch as the backend. I modified the Keras neural transfer example to incorporate the image warping and temporal constraints.
1. Image warping is to initialize the image from the previous stylized frame by taking into condsideration the optical flows between adjacent frames.
2. Temporal constraint stablizes the regions that do not have moving objects.

The workflow is as follows:
1. Scraped 20k images of paintings and pictures from Flickr to finetune the last convolutional layer of VGG16 so that it is able to distinguish paintings from pictures at 88%.
2. Generate optical flow and weights for temporal constraints using deep matching.
3. Perform style transfer on frames with initialization and temporal loss.

Stylized Videos

Please enjoy the following videos that I created for this project.

Flying Bat in Starry Night Style

Game of Thrones: Gradual Style Transitions between Muse and Scream

Current Research and Progress

The frame by frame optimization is a very slow way to perform style transfer. Recent research suggests that the amount of information contained in style is a tiny fraction of the information contained in the overall convolutional network. It is possible to pre-train a style transfer network so that every frame would only take a few seconds to process.
Facebook is testing a real-time video style transfer mobile app caffe2go in a few countries and will soon be deployed in a wide range of geographic locations.
It appears to a not-so-distant possibility to binge-watch our favorite shows in our favorite style.

An analysis of human factors in Hollywood

2016-10-08T00:00:00+00:00

Project Description

Hollywood is a complex human network. Producers, directors, writers, actors and cinematographers are working closely with each other to create movies. Their individual talents and the level of cooperation between them potentially have huge impacts on the outcome of movies.
My analysis identifies the factors that have statistical significance and quantifies their impacts on the revenue and profitability of movies.

Web Scraping

I scraped the data from Box Office Mojo. As a first time scraper, I definitely spent more than I’d expected. But I learned to add a random sleep time to reduce the probability of being denied of access.

Feature Engineering

This is a crucial step in the modeling process. The quality of features is a decisive factor of the quality of your model. The following features are created in this analysis:
- Number of stars in the cast: cumulative box office gross for all actors in the database are calculated for a rolling 10-year period. If an actor is ranked in the top 100, then this person is deemed to be a star. For a specific movie, we look at the 10-year period ending in the year that’s 2 years prior to its release year (assuming a 2-year production cycle on average) and count the number of stars in the cast (only the top 4 listed actors are considered).
- Director score: this is a measure based on the cumulative box office for the director in the same 10-year period.
- Prior cooperations between director&actor, director&producer, and producer&writer: the assumption is that people who cooperate more than once tend to have great working chemistry which will translate into success in their future collective efforts.
- Prior experience in the same genre: for each genre, we count the number of times that director/actor had exposure to prior to the movie production.

Model Results

The final model is derived from the following steps:
1. Run OLS using the features described above.
2. Perform grid search on Lasso as a guidance on the choice of features.
3. Run a second OLS on the reduced set of features.
4. Use Random forest to rank features by their importance.

The features that are significant at 95% level are listed below in the order of their importance:
- Director score
- Producer & Writer cooperation
- Star presence
- Director & Actor cooperation
- Actors’ prior experiences in Action/Adventure
- Directors’ prior experiences in Sci-Fi/Fantasy

Future Research

The cooperation features turned out to be among the most important factors in the model. One interpretation is that this might be simply due to the fact that many of the second cooperations are sequels and usually they have higher budgets than the original. Therefore, it’s not surprising that these features have high significance. To isolate the sequel effect, we would need to run the analysis on non-sequal movies only.

Although movie gross revenues are relative easy to capture with a straightforward linear model, the ROI is a bit more difficult. The score on the linear model would’ve been low enough that it’s not at all explanatory. A regression model is clearly not the best choice for ROI, classfication or deep learning might be a better option.

First week at Metis

2016-09-26T00:00:00+00:00

First week at Metis was in the book. I would say that it has matched my expectation so far and I’m glad that I made the decision to join. Our cohort comes from diverse backgrounds and it was a pleasure to have the opportunity to work with some of them on our first project.
Our mornings started with pair programming exercises. The problems are not difficult but it’s inspiring that the instructor would make us think about improve the complexity of our algorithms. This is something that I did not pay much attention before coming from a non-CS background. However, I believe it’s critical to choose the blazingly fast algorithms when dealing with big data sets.
The project took a major chunk of our time. I learned tremendous amount of knowledge from my teammates, such as the BeautifulSoup, sqlite3, seaborn python packages and Carto. I’m still working on understanding HDF data format and how it’s related to the pandas panels.
The lectures are fairly light on the first week and I hope it’ll pick up pace in the coming weeks. Really looking forward to week 2. The to-do list includes:
- Clean up code for project 1.
- Wrap up the coursera algo design course.
- Understand regular expressions better.

Where do we find potential donors?

2016-09-25T00:00:00+00:00

Project Description

WomenTechWomenYes (WTWY) has an annual gala at the beginning of the summer. The management has decided to place street teams at entrances to subway stations to collect email addresses and those who sign up are sent free tickets to the gala.
Our task is use MTA turnstile data to optimize the placement of street teams so that they can collect the most signatures from people who will attend the gala and contribute to its cause.

Our Approach

The natural approach would be locating the stations with the highest volumes and placing the street teams during their busiest hours. However, is this approach sufficient? Can we make the assumption that every signature collected has an equal chance to be a potential donor?
Open Secrets suggests that the top 1% of individual donors made 43% of all donations in the 2016 election cycle so far. This made us think that it’s not the quantity of signatures that matters but the quality. We should focus our attention on maximizing the opportunity to find the people that could potentially make substantial donations.
So here is an outline of our approach:
1.Rank the zip codes using the demographic data and political donation data. Demographic data from ZIP atlas consists of female population, percentage that takes public transportation to work, and percentage in the 200k+ households. We use the product of these factors as a proxy for the availability of donors (DA). In the absence of data of donation to causes of technology awareness, we use the political donation made by women working in technological sector as a proxy for the willingness of donors (DW). DA and DW is combined to produce the final rank.
2.Rank the stations based on volumes. Thanks to Dave on our team, we were able to download all MTA turnstile data from 2010. We decided to focus on the three-month window March to May because of its slightly higher-than-average volumes and its proximity to the gala.
3.Combine rank1 and rank2 to find the stations that are both rich in potential donors and high in volumes.

Solution

The map displays the rank of zip codes in heatmap format, with lighter color being donor-rich areas. The top 10 zip codes are a mixture of high-income residential areas and tech hubs. We use google reverse geocoding to find the stations that are located in these zip codes. Then we check if they also appear in the top 50 volume-ranked list. These are the five stations that we choose to be our target stations:
- 14 ST-UNION SQ LNQR456
- 59 ST-COLUMBUS ABCD1
- 72 ST 123
- 66 ST-LINCOLN 1
- 49 ST-7 AVE NQR
We then look at the volumes at the 5 target stations by weekday and by hour to determine the best time to deploy. The two heatmaps are created from turnstile data from March to May in 2015 and 2016.
It appears that the best day to deploy is Wednesday and best time is between 5-8 PM.

Scheduling

Depending on the resources available to the staff team, the scheduling of the team can get complicated. If a larget number of staff is available, then we probably should set up a test run to find the approximate hit rate, and then we can decide on how to allocate the team based on the foot traffic and hit rate at each station.