I lead Force Five Partners, a marketing analytics consulting firm (bio). I've been writing here about marketing, technology, e-business, and analytics since 2003 (blog name explained).

Email or follow me:

116 posts categorized "Marketing"

January 09, 2013

My New Book: Pragmalytics

I've written a short book.  It's called "Pragmalytics: Practical Approaches to Marketing Analytics in the Digital Age".  It's a collection and synthesis of some of the things I've learned over the last several years about how to take better advantage of data (Big and little) to make better marketing decisions, and to get better returns on your investments in this area.  

The main point of the book is the need for orchestration.  I see too much of the focus today on "If we build It (the Big Data Machine, with some data scientist high priests to look after it), good things will happen."  My experience has been that you need to get "ecosystemic conditions" in balance to get value.  You need to agree on where to focus.  You need to get access to the data.  You need to have the operational flexibility to act on any insights.  And, you need to cultivate an "analytic marketer" mindset in your broader marketing team that blends perspectives, rather than cultivating an elite but blinkered cadre of "marketing analysts".  Over the next few weeks, I'll further outline some of what's in the book in a few posts here on my blog.

I'm really grateful to the folks who were kind enough to help me with the book.  The list includes: Mike Bernstein, Tip Clifton, Susan Ellerin, Ann Hackett, Perry Hewitt, Jeff Hupe, Ben Kline, Janelle Leonard, Sam Mawn-Mahlau, Bob Neuhaus, Judah Phillips, Trish Gorman Clifford, Rob Schmults, Michelle Seaton, Tad Staley, and my business partner, Jamie Schein.  As I said in the book, if you like any of it, they get credit for salvaging it.  The rest -- including several bits that even on the thousandth reading still aren't as clear as they should be, plus a couple of typos I need to fix -- are entirely my responsibility.

I'm also grateful to the wonderful firms and colleagues and clients I've had the good fortune to work for and with.  I've named the ones I can, but in general have erred on the side of respecting their privacy and confidentiality where the work isn't otherwise in the public domain.  To all of them: Thank You!

This field is evolving quickly in some ways, but there are also some timeless principles that apply to it.  So, there are bits of the book that I'm sure won't age well (including some that are already obsolete), but others that I hope might.  While I'm not one of those coveted Data Scientists by training, I'm deep into this stuff on a regular basis at whatever level is necessary to get a positive return from the effort.  So if you're looking for a book on selecting an appropriate regression technique, or tuning Hadoop, you won't find that here, but if you're looking for a book about how to keep all the balls in the air (and in your brain), it might be useful to you.  It's purposefully short -- about half the length of a typical business book.  My mental model was to make it about as thick as "The Elements of Style", since that's something I use a lot (though you probably won't think so!).  Plus, it's organized so you can jump in anywhere and snack as you wish, since this stuff can be toxic in large doses.

In writing it amidst all the Big Data craziness, I was reminded of Gandhi's saying (paraphrased) "First they ignore you... then they fight you, then you win."  Having been in the world of marketing analytics now for a while, it seems appropriate to say that "First they ignore you, then they hype you, then you blend in."   We're now in the "hype" phase.  Not a day goes by without some big piece in the media about Big Data or Data Scientists (who now have hit the highly symbolic "$300k" salary benchmark -- and last time we saw it, in the middle part of the last decade in the online ad sales world, was a sell signal  BTW).  "Pragmalytics" is more about the "blend in" phase, when all this "cool" stuff is more a part of the furniture that needs to work in harmony with the rest of the operation to make a difference.

"Pragmalytics" is available via Amazon (among other places).  If you read it please do me a favor and rate and review it, or even better, please get in touch if you have questions or suggestions for improving it.  FWIW, any earnings from it will go to Nashoba Learning Group (a school for kids with autism and related disorders).

Where it makes sense, I'd be very pleased to come talk to you and your colleagues about the ideas in the book and how to apply them, and possibly to explore working together.  Also, in a triumph of Hope over Experience, my next book (starting now) will be a collection and synthesis of interviews with other senior marketing executives trying to put Big Data to work.  So if you would be interested in sharing some experiences, or know folks who would, I'd love to talk.

About the cover:  it's meant to convey the harmonious convergence of "Mars", "Venus", and "Earth" mindsets: that is, a blend of analytic acuity, creativity and communication ability, and practicality and results-orientation that we try to bring to our work. Fellow nerds will appreciate that it's a Cumulative Distribution Function where the exponent is, in a nod to an example in the book, 1.007.



August 18, 2012

Gaming Facebook Sponsored Stories #fb #sponsoredstories

Facebook's Sponsored Stories feature is one of the ad targeting horses the firm's counting on to pull it out of its current valuation morass (read this, via @bussgang).  

Sponsored Stories is a virality-enhancing mechanism (no jokes please, that was an "a" not an "i") that allows Facebook advertisers to increase the reach of Facebook users' interactions with the advertisers' brands on Facebook (Likes, Check-ins, etc.). It does this by increasing the number of a user's Facebook friends who see such engagements with the advertisers' brands beyond the limited number who would, under normal application of the Facebook news feed algorithm, see those endorsements.

Many users are outraged that this unholy Son-Of-Beacon feature violates their privacy, to the point that they sue-and-settle (or try to, oops).

What they are missing perhaps is the opportunity to "surf" an advertiser's Sponsored Stories investment to amplify their own self-promotion or mere narcissism.

Consider the following simple example.  Starbucks is / has been using this ad program.  Let's say I go to Starbucks and "check in" on Facebook.  Juiced by Sponsored Stories (within the additional impressions Starbucks has paid for), all of my Facebook friends browsing their news feeds will see I've checked in at Starbucks (and presumably feel all verklempt about a brand that could attract such a valued friend). 

Now, what if I, savvy small business person, comment in my check in that I'm "at Starbucks, discussing my <link>NEW BOOK</link> with friends!"  I've pulled off the social media equivalent of pasting my bumper sticker on Starbucks' billboard.

I need to look more closely into this, but as I understand it, the Sponsored Stories feature can't today prevent users from including negative feedback in their brand engagements, where such flexibility is provided for.  So if they can't handle the negative yet, it may still be that they can't prevent more general forms of off-brand messaging.

I'm sure others have considered this and other possibilities. Comments very welcome!  Meanwhile, I'm off to Starbucks to discuss my upcoming NEW BOOK.



August 08, 2012

A "Common Requirements Framework" for Campaign Management Systems and Marketing Automation

In our "marketing analytics agency" model, as distinguished from a more traditional consulting one, we measure success not just by the quality of the insights and opportunities we can help clients to find, but on their ability to act on the ideas and get value for their investments.  Sometimes this means we simultaneously work both ends to an acceptable middle: even as we torture data and research for bright ideas, we help to define and influence the evolution of a marketing platform to be more capable. 

This raises the question, "What's a marketing platform, and a good roadmap for making it more capable?"  Lots of vendors, including big ones like IBM, are now investing in answering these questions, especially as they try to reach beyond IT to sell directly to the CMO. These vendors provide myriad marketing materials to describe both the landscape and their products, which variously are described as "campaign management systems" or even more gloriously as "marketing automation solutions".  The proliferation of solutions is so mind-blowing that analyst firms build whole practices making sense of the category.  Here's a recent chart from Terence Kawaja at LUMA Partners (via Scott Brinker's blog) that illustrates the point beautifully:



Yet even with this guidance, organizations struggle to get relevant stakeholders on the same page about what's needed and how to proceed. My own experience has been that this is because they're missing a simple "Common Requirements Framework" that everyone can share as a point of departure for the conversation.  Here's one I've found useful.

Basically marketing is about targeting the right customers and getting them the right content (product information, pricing, and all the before-during-and-after trimmings) through the right channels at the right time.  So, a marketing automation solution, well, automates this.  More specifically, since there are lots of homegrown hacks and point solutions for different pieces of this, what's really getting automated is the manual conversion and shuffling of files from one system to the next, aka the integration of it all.  Some of these solutions also let you run analysis and tests out of the same platform (or partnered components).

Each of these functions has increasing levels of sophistication I've characterized, as of this writing, into "basic", "threshold", and "advanced".  For simple roadmapping / prioritization purposes, you might also call these "now", "next", and "later".


The simplest form of targeting uses a single data source, past experience at the cash register, to decide whom to go back to, on the idea that you build a business inside out from your best, most loyal customers.  Cataloguers have a fancy term for this, "RFM", which stands for "Recency, Frequency, and Monetary Value", which grades customers, typically into deciles, according to... how recently, how frequenty, and how much they've bought from you.  Folks who score high get solicited more intensively (for example, more catalog drops).  By looking back at a customer's past RFM-defined marginal value to you (e.g., gross margin you earned from stuff you sold her), you can make a decision about how much to spend marketing to her.  

One step up, you add demographic and behavioral information about customers and prospects to refine and expand your lists of folks to target.  Demographically, for example, you might say, "Hey, my best customers all seem to come from Greenwich, CT.  Maybe I should target other folks who live there."  You might add a few other dimensions to that, like age and gender. Or you might buy synthetic, "psychographic" definitions from data vendors who roll a variety of demographic markers into inferred attitudes.  Behaviorally, you might say "Let's retarget folks who walk into our store, or who put stuff into our online shopping cart but don't check out."  These are conceptually straightforward things to do, but are logistically harder, because now you have to integrate external and internal data sources, comply with privacy policies, etc.

In the third level, you begin to formalize the models implicit in these prior two steps, and build lists of folks to target based on their predicted propensity to buy (lots) from you.  So for example, you might say, "Folks who bought this much of this product this frequently, this recently who live in Greenwich and who visited our web site last week have this probability of buying this much from me, so therefore I can afford to target them with a marketing program that costs $x per person."  That's "predictive modelling".

Some folks evaluate the sophistication of a targeting capability by how fine-grained the target segments get, or by how close to 1-1 personalization you can get.  In my experience, there's often diminishing returns to this, often because the firm can't always practically execute differentiated experiences even if the marginal value of a personalized experience warrants it.  This isn't universally the case of course: promotional offers and similar experience variables (e.g., credit limits) are easier to vary than, say, a hotel lobby.  


Again, a simple progression here, for me defined by the complexity of the content you can provide ("plain", "rich", "interactive") and by the flexibility and precision ("none", "pre-defined options", "custom options") with which you can target it through any given channel or combination of channels.

Another dimension to consider here is the complexity of the organizations and processes necessary to produce this content.  For example, in highly regulated environments like health care or financial services, you may need multiple approvals before you can publish something.  And the more folks involved, the more sophisticated and valuable the coordination tools, ranging from central repositories for templates, version control systems, alerts, and even joint editing.  Beware though simply paving cowpaths -- be sure you need all that content variety and process complexity before enabling it technologically, or it will simply expand to fit what the technology permits (the same way computer operating systems bloat as processors get more powerful).


The big dimension here is the number of channels you can string together for an integrated experience.  So for example, in a simple case you've got one channel, say email, to work with.  In a more sophisticated system, you can say, "When people who look like this come to our website, retarget them with ads in the display ad network we use." (Google just integrated Google Analytics with Google Display Network to do just this, for example, an ingenious move that further illustrates why they lead the pack in the display ad world.)  Pushing it even further, you could also say, "In addition to re-targeting web site visitors who do X, out in our display network, let's also send them an email / postcard combination, with connections to a landing page or phone center."

Analysis and Testing

In addition to execution of campaigns and programs, a marketing solution might also suport exploration  of what campaigns and programs, or components thereof, might work best.  This happens in a couple of ways.  You can examine past behavior of customers and prospects to look for trends and build models that explain how changes and saliencies along one or more dimensions might have been associated with buying.  Also, you can define and execute A/B and multi-variate tests (with control groups) for targeting, content, and channel choices.  

Again, the question here is not just about how much data flexibility and algorithmic power you have to work with within the system, but how many integration hoops you have to go through to move from exploration to execution.  Obviously you won't want to run exploration and execution off the same physical data store, or even the same logical model, but it shouldn't take a major IT initiative to flip the right operational switches when you have an insight you'd like to try, or scale.

Concretely, the requirement you're evaluating here is best summarized by a couple of questions.  First, "Show me how I can track and evaluate differential response in the marketing campaigns and programs I execute through your proposed solution," and then, "Show me how I can define and test targeting, content, and channel variants of the base campaigns or programs, and then work the winners into a dominant share of our mix."

A Summary Picture

Here's a simple table that tries to bundle all of this up.  Notice that it focuses more on function than features and capabilities instead of components.  

  Marketing Automation Commonn Requirements Framework


What's Right For You?

The important thing to remember is that these functions and capabilities are means, not ends.  To figure out what you need, you should reflect first on how any particular combination of capabilities would fit into your marketing organization's "vector and momentum".  How is your marketing performance trending?  How does it compare with competitors'?  In what parts -- targets, content, channels -- is it better or worse? What have you deployed recently and learned through its operation? What kind of track record have you established in terms of successful deployment and leverage from your efforts?  

If your answers are more like "I don't know" and "Um, not a great one" then you might be better off signing onto a mostly-integrated, cloud-based (so you don't compound business value uncertainty with IT risk), good-enough-across-most-things solution for a few years until you sort out -- affordably (read, rent, don't buy) -- what works for you, and what capability you need to go deep on. If, on the other hand, you're confident you have a good grip on where your opportunities are and you've got momentum with and confidence in your team, you might add best of breed capabilities at the margins of a more general "logical model" this proposed framework provides.  What's generally risky is to start with an under-performing operation built on spaghetti and plan for a smooth multi-year transition to a fully-integrated on-premise option.  That just puts too many moving parts into play, with too high an up-front, bet-on-the-come investment.

Again, remember that the point of a "Common Requirements Framework" isn't to serve as an exhaustive checklist for evaluating vendors.  It's best used as a simple model you can carry around in your head and share with others, so that when you do dive deep into requirements, you don't lose the forest for the trees, in a category that's become quite a jungle.  Got a better model, or suggestions for this one?  Let me know!

July 26, 2012

Wanted: Marketing Analytics Director, Global Financial Services Firm (Mid-Atlantic) # Analytics

I've been working with a global financial services firm to develop its marketing analytics / intelligence capability, and we're now building a highly capable team to further extend and sustain the results and lessons so far.  This includes a Marketing Analytics Director to lead a strong team doing advanced data mining and predictive modeling to support high-impact opportunities in various areas of the firm.  Here's the job description on LinkedIn.  If you are currently working at a large marketer, major analytics consulting firm, or advertising agency, and have significant experience analyzing, communicating, and implementing sophisticated multi-channel marketing programs, and are up for the challenge of leading a new team in this area for a world-class firm in a great city, please get in touch!

July 16, 2012

Congratulations @marissamayer on your new #Yahoo gig. Now what? Some ideas

Paul Simon wrote, "Every generation throws a hero at the pop charts."  Now it's Marissa Mayer's turn to try to make Yahoo!'s chart pop.  This will be hard because few tech companies are able to sustain value creation much past their IPOs.  

What strategic path for Yahoo! satisfies the following important requirements?

  • Solves a keenly felt customer / user / audience / human problem?
  • Fits within but doesn't totally overlap what other competitors provide?
  • Builds off things Yahoo! has / does well?
  • Fits Ms. Mayer's experiences, so she's playing from a position of strength and confidence?
  • As a consequence of all this, will bring advertisers back at premium prices?

Yahoo!'s company profile is a little buzzwordy but offers a potential point of departure.  What Yahoo! says:

"Our vision is to deliver your world, your way. We do that by using technology, insights, and intuition to create deeply personal digital experiences that keep more than half a billion people connected to what matters the most to them – across devices, on every continent, in more than 30 languages. And we connect advertisers to the consumers who matter to them most – the ones who will build their businesses – through our unique combination of Science + Art + Scale."

What Cesar infers:

Yahoo! is a filter.

Here are some big things the Internet helps us do:

  • Find
  • Connect
  • Share
  • Shop
  • Work
  • Learn
  • Argue
  • Relax
  • Filter

Every one of these functions has an 800 lb. gorilla, and a few aspirants, attached to it:

  • Find -- Google
  • Connect -- Facebook, LinkedIn
  • Share -- Facebook, Twitter, Yahoo!/Flickr (well, for the moment...)
  • Shop -- Amazon, eBay
  • Work -- Microsoft, Google, GitHub
  • Learn -- Wikipedia, Khan Academy
  • Argue -- Wordpress, Typepad, [insert major MSM digital presence here]
  • Relax -- Netflix, Hulu, Pandora, Spotify
  • Filter -- ...

Um, filter...  Filter.   There's a flood of information out there.  Who's doing a great job of filtering it for me?  Google alerts?  Useful but very crude.  Twitter?  I browse my followings for nuggets, but sometimes these are hard to parse from the droppings.  Facebook?  Sorry friends, but my inner sociopath complains it has to work too hard to sift the news I can use from the River of Life.

Filtering is still a tough, unsolved problem, arguably the problem of the age (or at least it was last year when I said so).  The best tool I've found for helping me build filters is Yahoo! Pipes.  (Example)

As far as I can tell, Pipes has remained this slightly wonky tool in Yahoo's bazaar suite of products.  Nerds like me get a lot of leverage from the service, but it's a bit hard to explain the concept, and the semi-programmatic interface is powerful but definitely not for the general public.

Now, what if Yahoo! were to embrace filtering as its core proposition, and build off the Pipes idea and experience under the guidance of Google's own UI guru -- the very same Ms. Mayer, hopefully applying the lessons of iGoogle's rise and fall -- to make it possible for its users to filter their worlds more effectively?  If you think about it, there are various services out there that tackle individual aspects of the filtering challenge: professional (e.g. NY Times, Vogue, Car and Driver), social (Facebook, subReddits), tribal (online communities extending from often offline affinities), algorithmic (Amazon-style collaborative filtering), sponsored (e.g., coupon sites).  No one is doing a good job of pulling these all together and allowing me to tailor their spews to my life.  Right now it's up to me to follow Gina Trapani's Lifehacker suggestion, which is to use Pipes.

OK so let's review:

  • Valuable unsolved problem for customers / users: check.
  • Fragmented, undominated competitive space: check.
  • Yahoo! has credibly assets / experience: check.
  • Marissa Mayer plays from position of strength and experience: check.
  • Advertisers willing to pay premium prices, in droves: ...

Well, let's look at this a bit.  I'd argue that a good filter is effectively a "passive search engine".  Basically through the filters people construct -- effectively "stored searches" -- they tell you what it is they are really interested in, and in what context and time they want it.  With cookie-based targeting under pressure on multiple fronts, advertisers will be looking for impression inventories that provide search-like value propositions without the tracking headaches.  Whoever can do this well could make major bank from advertisers looking for an alternative to the online ad biz Hydra (aka Google, Facebook, Apple, plus assorted minor others).

Savvy advertisers and publishers will pooh-pooh the idea that individual Pipemakers would be numerous enough or consistent enough on their own to provide the reach that is the reason Yahoo! is still in business.  But I think there's lots of ways around this.  For one, there's already plenty of precedent at other media companies for suggesting proto-Pipes -- usually called "channels", Yahoo! calls them "sites" (example), and they have RSS feeds.  Portals like Yahoo!, major media like the NYT, and universities like Harvard suggest categories, offer pre-packaged RSS feeds, and even give you the ability to roll your own feed out of their content.  The problem is that it's still marketed as RSS, which even in this day and age is still a bit beyond for most folks.  But if you find a more user-friendly way to "clone and extend" suggested Pipes, friends' Pipes, sponsored Pipes, etc., you've got a start.

Check?  Lots of hand-waving, I know.  But what's true is that Yahoo! has suffered from a loss of a clear identity.  And the path to re-growing its value starts with fixing that problem.

Good luck Marissa!




July 11, 2012

Wonderfully #Pragmalytic Multi-Channel Attribution Advice From @avinash via @visualiq

Via my friends at VisualIQ, this wonderful post from Avinash Kaushik on doing multi-channel attribution and mix optimization in the real world.  Plus a really rich set of conversations in the comments. My summary of his advice (reassuringly consistent with my own experiences with "pragmalytic" approaches):

  • Start by solving for specific attribution / optimization use cases you face in the real world, not the more general form of the challenge.  He names three dominant ones he sees: "O2S -- Online to Store", "AMS -- Across Multiple Screens", and "ADC -- Across Digital Channels"
  • Use multiple analytic techniques to compensate for imperfect data that any one technique might rely on.  For example, if there are holes or quality problems with your data, supplement it with controlled tests
  • Don't cop out, but accept that there are no perfect answers, just better ones, and that you should bias toward acting on acceptably imperfect information and learning and improving based on actual experience

Absolutely terrific stuff here, gets even better on the third and subsequent reads.

July 03, 2012

#Microsoft Writes Off #aQuantive. What Can We Learn?

In May 2007, Microsoft paid $6 billion to buy aQuantive.  Today, only five years later, they wrote off the whole investment.  Since I wrote about this a lot five years ago (herehere and here), it prompted me to think about what happened, and what I might learn.  Here are a few observations:

1. 2006 / 2007 was a frothy time in the ad network market, both for ads and for the firms themselves, reflecting the economy in general.

2. Microsoft came late to the party, chasing aQuantive (desperately) after Google had taken DoubleClick off the table.

3. So, Microsoft paid a 100% premium to aQuantive's market cap to get the firm.

4. Here's the way Microsoft might have been seeing things at the time:

a. "Thick client OS and productivity applications business in decline -- the future is in the cloud."

b. "Cloud business model uncertain, but certainly lower price point than our desktop franchise; must explore all options; maybe an ad-supported version of a cloud-based productivity suite?"

c. "We have MSN.  Why should someone else sit between us and our MSN advertisers and collect a toll on our non-premium, non-direct inventory?  In fact, if we had an ad network, we could sit between advertisers and other publishers and collect a toll!"

5. Here's the way things played out:

a. The economy crashed a year later.

b. When budgets came back, they went first to the most accountable digital ad spend: search.  

c. Microsoft had a new horse in that race: Bing (launched June 2009).  Discretionary investment naturally flowed there.

d. Meanwhile, "display" evolved:  video display, social display (aka Facebook), mobile display (Dadgurnit!  Google bought AdMob, Apple has iAd!  Scraps again for the rest of us...). (Good recent eMarketer presentation on trends here.)

e. Whatever's left of "traditional" display: Google / DoubleClick, as the category leader, eats first.

f. Specialized players do continue to grow in "traditional" display, through better targeting technologies (BT) and through facilitating more efficient buys (for example, DataXu, which I wrote about here).  But to grow you have to invest and innovate, and at Microsoft, by this point, as noted above, the money was going elsewhere.

g. So, if you're Microsoft, and you're getting left behind, what do you do?  Take 'em with you!  "Do not track by default" in IE 10 as of June 2012.  That's old school medieval, dressed up in hipster specs and a porkpie hat.  Steve Ballmer may be struggling strategically, but he's still as brutal as ever. 

6. Perspective

a. $6 Big Ones is only 2% of MSFT's market cap.  aQuantive may have come at  a 2x premium, but it was worth the hedge.  The rich are different from you and me.  

b. The bigger issue though is how does MSFT steal a march on Google, Apple, Facebook? Hmmm. video's hot.  Still bandwidth constrained, but that'll get better.  And there's interactive video. Folks will eventually spend lots of time there, and ads will follow them. Google's got Hangouts, Facebook's got Facetime, Apple's got iChat... and now MSFT has Skype, for $8B.   Hmm.

7. Postscripts:

a. Some of the smartest business guys I worked with at Bain in the late 90's (including Torrence Boone and Jason Trevisan) ended up at aQuantive and helped to build it into the success it was.  An interesting alumni diaspora to follow.

b. Some of the smartest folks I worked with at Razorfish in the early 2000's (including Bob Lord) ended up at aQuantive. The best part is that Microsoft may have gotten more value from buying and selling Razorfish (to Publicis) than from buying and writing off the rest of aQuantive.  Sweet, that.

c. Why not open-source Atlas?

March 20, 2012

Organic Data Modeling in the Age of the Extrabase #analytics

Sorry for the buzzwordy title of this post, but hopefully you'll agree that sometimes they can be useful to communicating an important Zeitgeist.

I'm working with one of our clients right now to develop a new, advanced business intelligence capability that uses state-of-the art in-memory data visualization tools like Tableau and Spotfire that will ultimately connect multiple data sets to answer a range of important questions.  I've also been involved recently in a major analysis of advertising effectiveness that included a number of data sources that were either external to the organization, or non-traditional, or both.  In both cases, these efforts are likely to evolve toward predictive models of behavior to help prioritize efforts and allocate scarce resources.

Simultaneously, today's NYT carried an article about Clear Story, a Silicon Valley startup that aggregates APIs to public data sources about folks, and provides a highly simplified interface to those APIs for analysts and business execs.  I haven't yet tried their service, but I'll save that for a separate post.  The point here is that the emergence of services like this represent an important step in the evolution of Web 2.0 -- call it Web 2.2 -- that's very relevant for marketing analytics in enterprise contexts.

So, what's significant about these experiences?

Readers of Ralph Kimball's classic Data Warehouse Toolkit will appreciate both the wisdom of his advice, but also today, how the context for it has changed.  Kimball is absolutely an advocate for starting with a clear idea of the questions you'd like to answer and for making pragmatic choices about how to organize information to answer them.  However, the major editions of the book were written in a time when three things were true:

  • You needed to organize information more thoughtfully up front, because computing resources to compensate for poor initial organization were less capable and more expensive
  • The number of data sources you could integrate were far more limited, allowing you to be more definitive up front about the data structures you defined to answer your target questions
  • The questions themselves, or the range of possible answers to them, were more limited and less dynamic, because the market context was so as well

Together, these things made for business intelligence / data warehouse / data management efforts that were longer, and a bit more "waterfall" and episodic in execution.  However, over the past decade, many have critiqued such efforts for high failure rates, mostly in which they collapse of their own weight: too much investment, too much complexity, too few results.  Call this Planned Data Modeling.

Now back to the first experience I described above.  We're using the tools I mentioned to simultaneously hunt for valuable insights that will help pay the freight of the effort, define useful interfaces for users to keep using, and through these efforts, also determine the optimal data structures we need underneath to scale from the few million rows in one big flat file we've started with to something that will no doubt be larger, more multi-faceted, and thus more complex.  In particular, we're using the ability of these tools to calculate synthetic variables on the fly out of the raw data to point the way toward summaries and indeces we'll eventually have to develop in our data repository.  This will improve the likelihood that the way we architect that will directly support real reporting and analysis requirements, prioritized based on actual usage in initial pilots, rather than speculative requirements obtained through more conventional means.  Call this Organic Data Modeling.

Further, the work we've done anticipates that we will be weaving together a number of new sources of data, many of them externally provided, and that we'll likely swap sources in and out as we find that some are more useful than others.  It occurred to me that this large, heterogenous, and dynamic collection of  data sources would have characteristics sufficiently different in terms of their analytic and administrative implications that a different name altogether might be in order for the sum of the pieces.  Hence, the Extrabase.

These terms are not meant to cover up a cop-out.  In other words, some might say that mashing up a bunch of files in an in-memory visualization tool could reflect and further contribute to a lack of intellectual discipline and wherewithal to get it right.  In our case, we're hedging that risk, by having the data modelers responsible for figuring out the optimal data repository structure work extremely closely with the "front-end" analysts so that as potential data structure implications flow out of the rubber-meets-the-road analysis, we're able to sift them and decide which should stick and which we can ignore. 

But, as they say sometimes in software, "that's a feature, not a bug."  Meaning, mashing up files in these tools and seeing what's useful is a way of paying for and disciplining the back end data management process more rigorously, so that what gets built is based on what folks actually need, and gets delivered faster to boot.

March 12, 2012

#SXSW Trip Report Part 2: Being There

(See here for Part 1)

Here's one summary of the experience that's making the rounds:


Missing sxsw


I wasn't able to be there all that long, but my impression was different.  Men of all colors (especially if you count tattoos), and lots more women (many tattooed also, and extensively).   I had a chance to talk with Doc Searls (I'm a huge Cluetrain fan) briefly at the Digital Harvard reception at The Parish; he suggested (my words) the increased ratio of women is a good barometer for the evolution of the festival from narcissistic nerdiness toward more sensible substance.  Nonetheless, on the surface, it does remain a sweaty mosh pit of digital love and frenzied networking.  Picture Dumbo on spring break on 6th and San Jacinto.  With light sabers:


SXSW light sabers


Sight that will haunt my dreams for a while: VC-looking guy, blazer and dress shirt, in a pedicab piloted by skinny grungy student (?) Dude, learn Linux, and your next tip from The Man at SXSW might just be a term sheet.

So whom did I meet, and what did I learn:

I had a great time listening to PRX.org's John Barth.  The Public Radio Exchange aggregates independent content suitable for radio (think The Moth), adds valuable services like consistent content metadata and rights management, and then acts as a distribution hub for stations that want to use it.  We talked about how they're planning to analyze listenership patterns with that metadata and other stuff (maybe gleaning audience demographics via Quantcast) for shaping content and targeting listeners.  He related for example that stations seem to prefer either 1 hour programs they can use to fill standard-sized holes, or two- to seven- minute segments they can weave into pre-existing programs.  Documentary-style shows that weave music and informed commentary together are especially popular.  We explored whether production templates ("structured collaboration": think "Mad Libs" for digital media) might make sense.  Maybe later.

Paul Payack explained his Global Language Monitor service to me, and we explored its potential application as a complement if not a replacement for episodic brand trackers.  Think of it as a more sophisticated and source-ecumenical version of Google Insights for Search.

Kara Oehler's presentation on her Mapping Main Street project was great, and it made me want to try her Zeega.org service (a Harvard metaLAB project) as soon as it's available, to see how close I can get to replicating The Yellow Submarine for my son, with other family members spliced in for The Beatles.  Add it to my list of other cool projects I like, such as mrpicassohead.

Peter Boyce and Zach Hamed from Hack Harvard, nice to meet you. Here's a book that grew out of the class at MIT I mentioned -- maybe you guys could cobble together an O'Reilly deal out of your work!

Finally,  congrats to Perry Hewitt (here with Anne Cushing) and all her Harvard colleagues on a great evening!


Perry hewitt anne cushing



January 26, 2012

Controlling for Impression Volatility in Digital Ad Spend Tests @DataXu

I've recently been involved in evaluating the results of a matched market test that looked at the impact of changes in digital advertising spend by comparing test vs. control markets, and by comparing differential lift in these markets over prior periods (e.g., year on year).  One of the challenges involved in such tests is significant "impression volatility" across time periods -- basically, each dollar can buy you very different volumes of impressions from year to year.  

You can unpack this volatility into at least three components:  

  • changes in overall macro-economic conditions that drive target audiences' attention,
  • changes in the buying approach you took / networks you bought through, due to network-specific structural (like what publishers are included) and supply-demand drivers (like the relative effectiveness of the network's targeting approach)
  • changes in "buy-specific" parameters (like audiences and palcements sought).  

Let's assume that you handle the first with your test / control market structure.  Let's also assume that the third is to be held constant as much as possible, for the purposes of the test (that is, buying the same properties / audiences, and using the same ad positions / placements for the tests).   So my question was, how much volatility does the second factor contribute, and what can be done to control for that in a test?

Surfing around I came on DataXu's March 2011 Market Pulse study.  DataXu is a service that allows you to buy across networks more efficiently in real time, sort of like what Kayak would be to travel if it were a fully automated agent and you flew every day.  The firm noted a year-on-year drop in average daily CPM volatility from 102% to 42% from May 2010 to February 2011 (meaning I think the average day to day change in price across all networks in each of the two months compared).  They attributed this to "dramatically increased volume of impressions bought and sold as well as maturation of trading systems".  Notwithstanding, the study still pointed to a 342% difference in average indexed CPMs across networks during February 2011.  

A number this big naturally piqued my interest, and so I read into the report to understand it better.  The top of page 2 of the report summary presents a nice graph that shows average monthly indexed CPMs across 11 networks, and indeed shows the difference between the highest-priced and the lowest-priced network to be 342%.  Applying "Olympic scoring" (tossing out highest- and lowest-priced exchanges) cuts that difference to about 180%, or roughly by half -- still a significant discrepancy of course.  Looking further, one standard deviation in the whole sample (including the top and bottom values) is about 44%.  Again, though perhaps a bit less dramatic for marketers' tastes, still lots.

(It's hard to know how "equivalent" the buys compared were, in terms of volumes, contextual consistency, and audience consistency, since the summary doesn't address these.  But let's assume they were, roughly.)

So what? If your (display) ad buys are not so property-specific / audience-targeted that run-of-network buys in contextual or audience categories are OK, future tests might channel buys through services like DataXu and declare the buys "fully-price-optimized" across the periods and markets compared, allowing you to ignore +/- ~50% "impression volatility" swings, assuming the Feb 2011 spreads hold.

However, if what you're buying is very specific -- and only available through direct purchase, or one or two specialized networks at most -- then you ignore factor 2, trust the laws of supply and demand, and assume that you've bought essentially the same "attention" regardless of the difference in impressions.

I've asked some knowledgeable friends to suggest some perspectives on this, and will pass along their ideas.  Other feedback welcome, especially from digital advertising / testing pros!  Oh and if you're really interested, check out the DataXu TC50 2009 pitch video.


Books by
Cesar Brea