In the quest to design a social reader thats fun and useful, Tyler and I have previously leaned heavily on semantic entity extraction from web feeds. While useful, entity matching APIs are lossy, noisy, and can become expensive very quickly (many thousands of dollars per month). It is our belief that between the creators and curators of content there is more than enough "human intelligence" out there to categorize media. In nearly all instances a knowledgable reader is capable of more precisely tagging content.
Here are the benefits of setting up opengard.in this way:
- creators and curators will get virtual rewards for contributing
- we get over the heavy cost in automation of semantic tagging for a great number of feeds by using what's there already and creating a framework for additional human knowledge.
- we can build statistical algorithms to suggest tags by learning from human decisions given content/data. I think this could be a fun project where we can leverage open source & algorithms. I was thinking of starting with titles, first paragraphs, and song/video beginnings and converting them to binary arrays of "word" data. There may even be correlations between compression and content topics that are worth exploring