Suggesting Categories for Pages
Contents
Goal
Provide a web service that takes a URL and returns a list of categories for that page. These categories would be used to provide suggested topic names when the user pins a site.
A recipe for Pancakes on the Food Network website would for example return topics like: "Food", "Recipe" and "Pancakes".
Ideally these categories should also be cached or stored globally so that we can later start using these for other experiments like for example finding similar pages or suggesting pages.
Performance
Can we provide suggestions near real-time? Can we have these categories ready as soon as the user hits the Pin button and the Pin dialog appears? Do we have to?
Quality
Can we present the user with categories that make sense? Can we create a category suggestion service that is accurate?
Infrastructure
What would the server side part of this look like? Can we scale it up as the number of users and pages in Pancake grows?