Data crowdsourcing projects around Sandy: Which project are able to engage the crowd?

Saturday November 3th took place a #HurricaneHackers hackhaton at the MIT MediaLab, as part of a series of SandyCrisisCam events. This event aims to build technological tools that help Sandy recovering. I (Mayo Fuster Morell) participated in the event, and it was a fantastic experience.

We were around 20 people, many of us did not know each other previously. During the morning each got a sense of the way each wanted to contribute. Most of the people wanted to developing new tools and they did great stuff. However, to me it was unclear which tools were already done, and if and how the tools developed were actually used. That is, if they actually engaged the crowd. So I though to contribute by doing a list of data crowdsourcing projects for Sandy recovery, and analyses the level and type of usage. In this post, I explain what I did.

What are data crowdsourcing projects?

I understood as data crowdsourcing projects those based in collaborative systematization of data to build a useful resources. In order for a project to be crowdsourced, it has to have the possibility to participate in the collection of the data. Actually, most of the project did not require any registration or very basic one in order to contribute in some way.

Data crowdsourcing projects tend to be conceived as civic action. For example, see the way in which Sparkrelief conceives their own action, which I found so meaningful. In their own words:

“Our Promise:
1. We see a world where people are empowered to help one another during disasters
2. We seek to replace complexity and despair with simplicity and hope
3. We promise to do our best to “Empower You to provide disaster relief” and hope you will join us in creating a better world
4. If you believe what we believe, let’s change the world together”

I adopted the term “data crowdsourcing projects”; however, there are different terms that are being used to refer to this form of collective action, such as, “peer to peer”, “online creation communities” or “common-based peer production” used in academic circles and “power of the people” used by some of the projects.

Data crowdsourcing has commonalities with crowdfunding, as both are based on aggregating contributions. However, while the first is based on contributions through providing data, the second is based on contributions of monetary donations. I actually identified initiatives based on crowdfunding (such as Occupy: Sandy Relief NYC that collected money though Wepay. However, I did not consider them for the list of data crowdsourcing projects since these initiatives were clearly crowdfunding and not pure examples of data crowdsourcing.

In total I identified more than 50 projects based on engaging the crowd for #Sandy relief. 20 projects have engaged some, even if minimal numbers, of data crowdsourcing. This is not at all a complete list, nor is it representative of one, but observing these cases could give us some insight toward approaching this phenomenon.

Typology of crowdsourcing data projects

I would differentiate four typologies of data crowdsourcing project.

1) There are crowdsourcing projects which are based on adding contributions into a common-pool. The common-pool adds value by providing a new “picture” or new knowledge of an issue. This is the case, for example, of projects based on Crowdmap (a collaborative mapping tool) where people contribute to reports added to the common map. In this type of project, the common-pool is the primary goal.

2) A second type is based on contributions of performing small tasks for archiving a very large work. This is the case of Damage Assessment HOTOSM, which provides a tool for categorizing images of building damage, allowing the Federal Emergency Management Agency (FEMA) to prioritize its efforts.

3) A third typology of data crowdsourcing results from synergistic communication. That is, individual communication practices use a common protocol, and as a side effect it results in the building of a common-pool. This is the case of Instacane , a pool of Sandy pictures based on channels from Sandy’s Instagram feed.

4) A fourth type is based on creating a tool that assists in matching needs with availability to help. In this case, the common-pool is created primarily to creating connections, not because the common-pool has value in itsself but that the value is the connections created. This is the case of a data crowdsourcing project such as #SandyAid & Sandy’s list for non-emergency assistance, based on collecting messages of help need and willingness to help.

List of data contributions projects and analysis: Typology of promoter of the tool, typology of technology, typology of contribution, and level of usage.

What did I actually do? Basically, I designed a database by listing the projects and collecting data for each project regarding typology of promoter of the tool, typology of technology, typology of contribution, and level of usage or crowd engagement.

There are very different types of promoters who set up data crowdsourcing projects, including: Local newspapers (such as a crowdmap by and mainstream media (such as Huffington Post Sandy Stormwatch), individual citizens (such as @noneck with a map of Coworking places at NYC), networks of local citizens (such as the crowdmap of Catskill promoted by, networks of hackers (which is the case of #HurracaneHackers and the tool SandyTimeline: A timeline of the key events as Hurricane Sandy unfolds), communities around citizens’ sciences (such as The Public Laboratory of Science & Technology), and university departments (such as the tool Tweak the Tweet Sandy Map of University of Washington) and government agencies (which is the case of Fairfax County Reporting Map).
There is also the case of the map developed by a corporation; however, it is not based on data crowdsourced data. It is possible to send feedback, but not possible to add data to the map.

It is also curious to observe that some of the promoters are not only linked to Sandy projects, but have also been active before, while being part of continious efforts to help in natural crisis moments; such as Crisis commons; Vtresponse, a citizen initiative from the Green Mountains (Vermount) that was previously activated for the Hurricane Irene storm, or Sparkrelief , which was created for Colorado wild fire.

In terms of typology of technology, most of the experiences are based on adopting and combining existing tools rather than developing new ones. It is also the case that new ones, such some of the ones being developed in the timeframe of #HuracaneHackers, are not yet ready to use. The tools currently in use for mapping are diverse: Crowdmap, Google Maps, SeeClickFix, OpenStreetMaps, and MapMill; or they are based on Google sheets with diverse type of interfaces.

In terms of typology of contribution, it depends on the type of crowdsourcing data projects, but basically most are based on giving the possibility to add a very restricted and guided type of data that will be inserted in a very structured environment. In other words, what can be done and contributed to the common-pool is very structured. In regard to territorial scope, a large majority of the tools have a scope for the whole Sandy area affected, while others are restricted to a particular country or city.

Finally, in terms of the level of usage and crowd engagement, I get a sense that there is a very big gap between the projects that are “data naked” (that is, there is only the tool but no data,) and “has not been used at all (not selected for my list)” and the ones that have data, even if just a little bit. The contributions of data go from few, such as 10 data points(SeeClickFix), to some, or as many as 400 (as is the case of Sparkrelief: Hurricane Sandy Relief Efforts), to many data points, such as the 1012 contributions (as is the case of Tweak the Tweet Sandy Map of University of Washington).

Which projects are able to engage the crowd?

From observing the data, the impression I get is that the localized projects are more able to engage crowdsourced data. By “localized projects” I mean that the scope of the tool is local (a town or a city, and not the overall region affected by Sandy,) and that the promoter is also locally-rooted into an existing community. In these cases, the tool is not presented “alone”, but tends to be part of or linked to a blog or other communication channels that were already active and had an audience before #Sandy.
Promotors with high visibility or trusted authority also seem to be able to engage the crowd.
Additionally, specialized groups (such as Commons Crisis and Sparkrelief which were active previously that #Sandy arrive) seem to be good promotors of crowdsourced data projects.

In sum, there is not one formula, but several formulas that seems to work for engaging the crowd.
In contrast, projects that seems to be “alone” and without connection to previously existing communities, or without association to trusted and visible actors, seem to face difficulties in engaging participation.
In other word, the institutional designs and the connections in which the projects are promoted seem to be a very relevant factor in being able to engage the crowd. Furthermore, visibility seems to be a key resource.

Effective, easy-to-use technology with a useful scope is far from being enough to engage the crowd.

Other aspects that seem to be present in the projects with more data is that the starting of collaborative data collection tends to happen after a centralized starting push effort (by one person or the promoter of the tool) to add data to the common-pool.

Hopefully this preliminary analysis provides a first taste of possible research into data crowdsourcing projects in crisis context. In the coming weeks I plan to develop a more accurate analysis of the data hoping it brings more light on the conditions that favor crowd engagement.

Here you may visit the list and see the projects, their characteristics, and the level of crowd engagement to arrive to your own conclusions and analysis. Ah! And please add new cases if you have identified them! This is a data crowdsourced project too!

Mayo Fuster Morell

P.D. Some thoughts from a bigger picture

P.D.1: Sandy has opened up the debate of State versus market in the Presidential election. Several political analysis has expressed how much #Sandy has push back an anti-state and pro-privatization discourse as Sandy created a context in which the action of the State is very present in helping citizens recover from the big hurricane. As a Spanish journalist put it “big hurricane, big state”. What about the commons as a third model of resource production and management? Certainly, common-based formats are growing in importance, as Yochai Benkler argued in his book, “The Wealth of Networks,” with examples such as Wikipedia and free and open source projects (FLOSS). Data crowdsourced in context of crisis is a new area in which we are seeing common-based peer production emerging. Still, with Sandy we have seen that citizens self-organizing supported by technology to solve common needs in crisis context has a great potential but is still in its initial stages. In the terms of several of the projects I collected, the efforts done by data crowsourcing projects can not be privileged than those of the governmental agencies such as FEMA.

P.D.2: We have seen how technology is channeling actions of helping each other in a context of natural crisis. However, natural crisis is not the only type of crisis that the USA (and my country, Spain,) is facing. Do these experiences tell us something in regards to citizens’ collaboration to recover from economic (and political) crisis? Is this a path of changing the world by helping each other?