Tuesday, January 15, 2013

How to Automatically Parse YouTube View Statistics using importxml?

An automatic way to import YouTube video view statistics to Google Spreadsheets is to use importxml() function. In this post, a working example will be demonstrated using the following YouTube playlist:  https://www.youtube.com/playlist?list=PL38214462962EB8A0

First, the page source of the YouTube channel needs to be opened to get which parse value to use while importing the view statistics.As shown below, details of each video is listed under "video-overview" class.


<div class="video-overview">
    <h3 class="video-title-container">

      <a href="/watch?v=WRGfYqqjXDo&amp;list=PL38214462962EB8A0&amp;index=2" class="yt-uix-tile-link yt-uix-sessionlink" data-sessionlink="feature=plpp_video&amp;ei=CPbFlenq67QCFZYUIQodkUULKQ%3D%3D">
        <span class="title video-title "  dir="ltr">Corner Reflector / Retroreflector (FDTD Animation)</span>
      </a>
    </h3>

      <p class="video-details">
        <span class="video-owner">
by <span class="yt-user-name " dir="ltr">meyavuz</span>
        </span>

          <span class="video-view-count">
            151 views
          </span>
      </p>

  </div>

 Thus, in the Google Spreadsheets, we can type the following in one of the cells:

=importxml(A1,"//div[@class='video-overview']")

where cell A1 has the web address of the YouTube playlist we are working with. Once this is done, the spreadsheet will automatically populate the cells with the titles, video uploader and video views etc as shown below:


However, the cells containing the video view count also includes the user name and some other text, e.g. for this playlist, we have "by meyavuz48 views". To extract the video counts from this text, we need to use some of the functions of the Google Spreadsheets. For this particular case, typing

=value(replace(left(C1,len(C1)-5),1,10,""))


will provide the view statistics in integer form as shown below: Basically, "5" in the formula corresponds to number of characters of the word "views". Then, using the "replace" function, the first 10 characters "by meyavuz" are replaced by empty string "". Finally, the text is converted to integer using the "value()" function.




Note that using "video-overview" as the class inside importxml() function provides the titles and view count of each video. If only the video count is sought after, one can use  'video-view-count' keywords as input to the importxml function as:

"=importxml(A1,"//span[@class='video-view-count']")

Also, the full list of Google Spreadsheets function can be found in the following link:
http://support.google.com/drive/bin/static.py?hl=en&topic=25273&page=table.cs&tab=1240296

3 comments:

  1. So this works for playlists. Great! However, is there a similar code I could use for individual videos? Or is there a code that I could use to monitor all my videos?

    "=importxml(A1,"//span[@class='video-view-count']")

    only seems to work when playlists are involved.

    ReplyDelete
  2. For that case then you need to look at the page source info. For individual videos, the page source info says that you should use "watch-view-count" instead of "video-view-count". The following code works:

    =importxml(A1,"//span[@class='watch-view-count ']")

    where A1 is the link to the individual video, e.g. http://www.youtube.com/watch?v=785kRIZ7aeI

    ReplyDelete