Project Description

TagCloudSoma is a project created by SOMABarcelona dedicated to creating a Tag Cloud from a URL or an input text.

We think an optimum technique to get a summary of someone's profile through the Net could be a word cloud. It can be applied to Webs, Blogs, Twitter and other sources. From SOMA we have implemented a beta version and release it to the community to explore its possibilities and applications.

With this version you can, given a source (text plain or URL), gather its most important words and their weights. It is possible to remove words from the list by defining them into a dictionary (1 resource per language; the app takes the resource of the culture of which the thread is being executed). Also different patterns can be defined in regular expression format to remove words from the list; they are also defined in a resource file.


Solution's content:
  • TagClouSoma Project: It includes the necessary functions to perform the tag cloud.
  • GetSource: Gets the source code of a webpage.
  • StripHTMLTags:  Strip all html tags from the input text.
  • Cleaner:  Cleans the input text removing punctuation marks, brackets, telephone numbers, email address, etc. You can add regular expressions to file patterns.resx
  • WordConter: Counts words and sorts by the number of occurrences. Also excludes the usual words of the current CultureInfo, using resource dictionary.resx.
  • TestTagCloud Project: Makes NUnit tests from the previous project.
  • Main Project: It is an example project that lets you get started in a visual way. Self-explanatory, no more info is required.


Future Relases

If you want to help us further improve the project please contact us at info@somabarcelona.com or www.somabarcelona.com

Please read license for terms & conditions.

Last edited Sep 28, 2009 at 2:26 PM by SOMADevelopers, version 4