Project Overview


The proposed project benefits farmers, foresters, environmental decision-makers, and relief workers worldwide who can access the Internet, by providing them targeted applications bundled with relevant, location-specific data within an online workspace they can share with others. It partners with UN member countries, CGIAR centers, development agencies, the Global VSAT forum and others to identify and complement efforts like India's "e-Choupal", Benin's remote outreach centers and World Food Programme's field communications efforts that connect remote communities and development projects to the Internet.

Modeled on the highly successful "bottom-up" approach taken by the US Globe program for Earth science education, a configured bundle provides users a multilingual portal rich in data about the area they live. Using modem levels of connectivity, users gain access to terabytes of terrain, soils, water, temperature, vegetation index, MODIS, TRMM, CBERS, Landsat, and other GIS data. Users can also digitize areas of interest directly within a web application, and annotate, upload, and share georeferenced field notes and measurements with other authorized individuals. Crop and forest modeling, erosion and water quality management tools for manipulating this data within a user's workspace are provided.

They also gain access to a global network of experts and project managers who can provide real-time, interactive advise over the Internet, and a powerful, easy to use mechanism for publishing and subscribing to data generated by peer groups. Identity management, audit trails, and structured workflow enable a wide variety of enforceable, sustainable value propositions to be constructed. A customizable  "market module" template is provided to bootstrap direct market agreements and rudimentary supply chain certification using web, cellular SMS, RFID and barcode-based interactions between buyers,  farmers, and other participants in a regional economy,  where no viable alternatives exist.

Unlike e-Choupal or the Globe effort, the primarily open-source, pre-packaged software framework can be  replicated and expanded by participating institutions for the price of commodity CPU, storage, and bandwidth.

Phase one of this project focuses on identity management, data registration, peer-peer content discovery, and automated aggregation into a simple "clip, zip, and ship" mechanism for "fat" clients, and OGC web mapping content document for online services. Phase 2 focuses on enhanced client-server interactions with the portal, batch-oriented applications and analysis that run at the portal itself, and outreach.



India's "e-Choupal"



Contents

Strategic Positioning

Rural Connectivity and Sustainable Agriculture
Servicing The "Pixel Inhabitants"
Local Knowledge
Orchestrating International Efforts
Creating Value Chains
Spatially Explicit Information for Lenders

Foundation Software

Open Source Portal Base
Federation
Image Processing and Spatial Analysis
Market Module
Commercial Software

Data

Baseline Provisioning
Additional Data

Distributed Workflow
Subsetting
Outputs

Metadata

Index Creation
Publishing Mechanisms
Access Control
Services

Identity and Delegated Authority

Trust
Certificates
Authenticated Field Inspectors

Applications, Distance Learning, and Outreach

Fat Clients
Financial Calculations
Image Processing
Shared Desktops and VOIP
Run-time Access Control
Distance Learning

Simulation and Decision Support

Crop Modeling
Hydrology
Integrated Processes

Semantics

Taxonomic Support
Tools

Collaborative Development and Research

Positioning
Site Validation Efforts
Commercial Partnerships
University and NGO research
Crop Models
Farm-level Economic Models
Knowledge Structuring
High Performance Computing
Project Activities








Strategic Positioning

Rural Connectivity, Sustainable Agriculture,
Linking rural areas to the world with cell phones and the Internet is a powerful agent of change. A dramatic example is India's "e-Choupal" system, that reaches millions of  farmers in over 11,000 villages. By directly linking farmers to markets, it benefits local villages with higher prices by removing middlemen. An accelerating number of similar activites worldwide are realizing the potential of wireline modem or inexpensive very small aperature satellite terminals (VSATs) to benefit rural communities. A thorough discussion by USAID of the potential for Information and Communication Technologies (ICTs) in rural agriculture may be found here.

The proposed effort complements this trend by providing these rural users with powerful tools and data to more effectively realize sustainable agriculture and share geospatial information about  where they live. Beyond market prices, the proposed framework provides data, tools, and IT infrastructure to address location-specific questions about planting time, irrigation, tillage, pesticide and fertilizer application. It also provides a thin-wire framework to rapidly provision existing assets, order new imagery, and coordinate activities after catastrophic events by providing geographically  aware newsgroups, discussion lists, and shared workspaces.

Servicing The "Pixel Inhabitants"

Careful positioning of this project by the United Nations Food and Agriculture Organization (FAO), the Consultative Group on International Agricultural Research (CGIAR),  USAID and other development agencies can establish, at minimal cost, a critical “neutral ground” to coordinate and harness relevant remote sensing and information system development efforts for the benefit of developing countries, at the individual farm and watershed level. Indeed, that has been the justification for major expenditures on the part of CEOS members.

Major related efforts include:

Local Knowledge and Community

CEOS members operate several interesting orbital platforms; FEWS, GMES, and JRC create interesting synoptic information products on ever more powerful supercomputing grids; NASA and ESA fund extremely interesting research prototypes demonstrating the utility of their Earth Observation products. But the vast majority of rural communities have had little prospect to benefit from these advances until now, until the Internet has started to reach their towns and villages. Even then, bridging the gap between CEOS member prototypes and UN member populations will require extensive participation by FAO, CGIAR, WFP, USAID, IBRD and their close affiliates, such as RCMRD and RECTAS. They are the organzations that understand the regional economics, and have detailed, structured “local knowledge” about physical processes observed by the CEOS members. They are the organizations with sustained field presence and deep "local knowledge" of  issues important to day-to-day operational decision makers, and how it does - and doesn't - mesh with information and communication technologies (ICTs).  They are indeed the organizations that can fully harness CEOS capabilities to address UN member country issues in sustainable agriculture and poverty alleviation in a well-coordinated, comprehensive, global fashion, and effectively coordinate a Global Land Cover Test Sites Project for "the masses" back to CEOS members.

Orchestrating International Efforts

To achieve this goal, FAO, USAID, World Bank, and CGIAR can orchestrate the creation of baseline, open-source Internet 'portal' bundles of relevant data and tools, which can be freely downloaded, and position the overall effort as their own interoperability testbed. They can leverage their strategic OpenGIS membership to drive standardization of trade-offs suitable to its stakeholders. Because this effort coordinates agriculture-related IT development and remote sensing, and is aligned well with activities such as NASA SEEDS and Canada's GeoConnections, it can provide an umbrella framework for relevant national research that benefits all member countries, yet still provide innovation potential to individual research efforts.

Most importantly, they can focus GIS standardization - and procurement - to achieve their objectives: sustainability and the Millenium Development Goals.

Spatially Explicit Information for Lenders
Because the framework provides a geographically referenced, semantically rich framework,  it provides multilateral lending institutions a unique opportunity to develop and evaluate spatially explicit impact assessment models.  Locally maintained household data and interaction models can be incorporated by in-country officers and incorporated into comprehensive frameworks like FEWS. This will also provide lendors baseline statistics for compliance with the Pelosi Act, Equator Principles, and World Bank environmental safeguards.

Creating Value Chains

The effort also can take leadership to foster the creation of value chains between users within the overall federation, by establishing a simple standard for tracking usage of geographic content and algorithms, similar to telecom “call detail records” (CDRs). These CDRs are the essential ingredients for creating enforcable "credit" systems between stakeholders: in-kind bartering and value chaining of a wide variety, complementary to run-time contract protocols such as the proposed OpenGIS Web Pricing and Ordering System, (WPOS) or Electronic Business XML (ebXML) Trading Partner Agreements (TPAs).The XML-based IP detail record specification will be considered as the potential standard for recording value exchange between authenticated users, content and service providers.

An example of such a value chain might be different UNEP programs that benefit from accessing high resolution SPOT imagery directly from the Vito-managed archive. A user might display several several "standard image unit" 120000 pixel screens, one of which involved an ESRI ArcGIS Spatial Analyst licence for 1/2 CPU hour to estimate Cambodian deforestation processes. Each of these items: the 1/2 CPU hour and the SPOT data, could be reconciled against different specific funding programs: the ESRI conservation GIS program, a World Wildlife Fund program for hardwood forest management, and SPOT Image. Once accounting CDR reconcilation processes are accepted by stakeholders, the path is open to value chaining, credit, settlement protocols, and bartering of geospatial information and services between federation members.

CDRs can be reconciled in much the same way SWIFT net and VISA reconcile overall inter-bank balance of payments between its members, or TELCOs reconcile CDRs and share revenues on calls that span multiple carriers. A simple, standardized CDR format and authenticated reconciliation process will provide an incentive for high-value online geographic service providers, such as GlobeXplorer, MapPoint, and ESRI's Geography Network to participate.

Customer Intelligence

Billing records will be part of the more comprehensive customer behavior profiling and data mining system. Detailed "click records" and overall site statistics will be processed using the popular business intelligence Weka suite of machine learning and predictive behaviour algorithms. These statistics will be kept in strict accordance with the delegated authority identity schemes, and available to users, or expunged, along that chain upon request.  In particular, extreme care will be taken to observe privacy, and avoid politically explosive situations that might arise between soverign governments and NGOs, or UN agencies.  However, users will be advised that information may be shared with authorities in abusive or extreme circumstances.  Logs of email threads will also be processed using the NetVis social network visualization suite, and made available to "community managers" on a per-group basis.

Open Source Portal Base
Software development costs will be minimized by leveraging a wide array of mainstream open-source software. Major enhancements or new subsystems of the platform can be undertaken independently by universities or agencies worldwide. At the portal's core will be a fully-featured, multilingual enterprise-class framework capable of content management, community building, and workflow execution, whose internal object classes have spatial attributes, filtering, and aggregation capability. Built on Jboss, the rich persistent object framework Hibernate, Rapid Application development (RAD) will be performed within the Java portlets and companion Java Faces framework. Additionally, all transactions (content uploads, data provisioning, user sign-ups, subscription service processing, etc) will be performed using the JBoss Java Business Process Management (jBPM) workflow engine. The design philosophy will generally follow the component model interfaces of the "Geospatial One-Stop" effort, but take liberties in favor of design elegance and "doing the right thing," particularly given the strong emphasis on multi-lingual support, taxonomic categorization, and structured workflow.

The arena of full-blown, java-based, open-source content and portal offerings using these core packages (JBoss, Jbpm, Hibernate, JSF) is changing extremely rapidly. A selection process will be undertaken between the major platforms , Jboss Portal, Alfresco, JackRabbit, Nuxeo in particular, due to their backing by major corporations, RAD GUI design tools, and design requirements for scalability and mission-critical qualities of  service. All have scriptable content ingest, metadata extraction, version control, and role-based lifecycle management. All have discussion lists with delegated administration by moderators.

Foundation Desktop Software

The overall project will be developed and maintained as a constantly evolving, modular 'reference platform', orchestrated by FAO, CGIAR, national agencies, NGOs, and universities worldwide, relentlessly driven by the requirement to be genuinely useful and cost-effective for agricultural decision-support using appropriate technology. In its simplest and most important form, the portal will enable users to search, aggregate, and subscribe to useful data for areas of interest from a wide variety of sources. This data will be available be made available as online KML and OGC layers, or aggregated into download chunks via a mechanism similar to the USGS Seamless server. The baseline viewer for applications will be Google Earth, with overlays streamed as either KML vectors or ground overlays. Users can also "clip, zip, and ship" data downloads to use in existing desktop applications (such as CGIAR tools , the USDA/Forest Service Forest Vegetation Simulator, FAO's WinDisp , ADDAPIX, CROPWAT, CROPWAT, CLIMWAT, SIMIS etc.) or manipulate it within a "workspace" at the portal itself on a remote desktop. Aggregation of content will attempt to optimize overall CPU, bandwidth and storage utilization by employing batch-oriented subsetting and lossless compression of the areas of user interest as far "upstream" in the overall workflows as possible.

Robust identity management will enable user groups to purchase and use their own pools of floating license tokens. In this way, users with modest connectivity (such as a VSAT terminal) can access fully licensed ESRI, ERDAS, ENVI, and other products, which have local access to large data sets. Because nearly all of these products use the Flex-LM license management scheme, detailed usage records of license are available.  A translator of these logs to IPDR format will be created, to support 'software as a service' style cost-accounting.

Content Sharing, Search and Ranking

It is also generally acknoledged that the "find, use share and extend" (FUSE) model of user-contributed content has tremendous value, if properly structured, Wikipedia and Craig's List being spectacular examples. A primary goal of this effort is to allow facilitate FUSE use cases for geographic data with semantic interoperability. Several of the previously mentioned content management systems employ a variety of these techniques against. The particulars of these systems will be examined in relationship to the customized indexes generated for geographic data dictionaries. While Craig's List has enjoyed success in part due to a fixed taxonomy for each locale, Wikipedia's support of indexing by different, concurrent taxonomies has proven effective, giving renewed meaning to the phrase "a rose is still a rose by any other name."

At the same time, the ability of the big portals: Google, Yahoo, Microsoft, AOL, and ASK to offer this mostly government-funded data uniformly as an easy-to-use, subsidized service with mission-critical quality of service, has generated exponential growth in actual usage by end users. Making peer-generated geographic data this easy to use - while supporting extremely diverse schemas - is a primary goal of this effort. Right away, users will need to decide if the data they are sharing can be "self-hosted" - i.e. their institution has the ability to merely publish it, and the portal's utility is mostly registration, indexing, publising and fusion, or the data is to be replicated and hosted at the site itself. The decision to rehost data will be a combination of level of service, content popularity, and "importance." While some data might not be accessed very often, it may be very "important" to have it online with a guarenteed quality of service. The depth of sea ports in South Eastern Africa might not be important until a famine situation is imminent, or air fields in Sumatra until a Tsunami strikes.

Clearly, one factor in ranking are usage patterns.

Effective methodologies to categorize and uniformly index and search geographic data from heterogeneous sources has always been difficult. Beyond bounding box and simple text descriptions, the long history of the FGDC metadata working groups and clearinghouse efforts is a testament to this difficulty, and the near total obscurity of their results a testament to its overall effectiveness.  It is very difficult to create and maintain geographic data that is fully FGDC "compliant", and therefore, not much exists. However, it is well defined, "actionable" information, by virtue of rigorous control of terminology and metadata. Clearly, the race is on to revisit the basics armed with new search and indexing tools and techniques. Beyond the handful of simple categories enumerated in ISO/TC211 19115 B.5.27, a strong emphasis will be made on rigorous categorization of content, groups, and workflow items according to well-known taxonomies and controlled vocabularies, beginning with land cover data according to the FAO LCCS specification. But at least a "minimal set” of the essential FGDC elements and ISO 19115-compliant metadata will be required of any business process attempting to publish spatial collection-level information. 

To address these issues,  mainstream full-text and next-generation semantic search engine techniques will be used to enhance traditional geographic data metadata. In particular, both the descriptions and data dictionary items of registered data will have full-text search indexes routinely built  using standard Lucene , able to cross-reference loosely "tagged" content such as the increasing amount available via GeoRSS feeds. Beyond this, uploaded shapefile text columns will also be crawled and  cross-referenced with the eight million geographical entries of GeoNames. Given sufficient bandwidth, registered feature services themselves will also be crawled and indexed in a tiled manner. 

Ranked text searching has obvious found spectacular success with Google 'page rank' algorithm, now embodies in such Lucene subprojects as Nutch and Hadoop in use at most of the search portals. The importance of semantic capability is also becoming clear. The levels of funding going into startups like Radar Networks hint at the next generation of mainstream search.

Specifically with respect to geographic data, nobody would argue that bounded box search for "roads" should yield  highways, streets, or unimproved asphalt. Yet such "common-sense" logic eludes full-text searching. The same is true for multi-lingual search; an emergency relief effort in Peru looking for 'roads' should be able to find features such as "caminos".

To address these issues, support for semantic indexing of data dictionary items will be provided using a subset of the OpenCYC common sense ontology, with additional, specialized ontologies performed within the OWL-Lite profile of Jena. Upon registering and/or uploading their data, users will be provided ranked semantic search results as starting candidates within a stripped-down version of IsaViz, and encouraged to refine their definition within RDF "is a" and "has a" properties for their tables and columns.   A few baseline schemas will be established early on, starting with NASA's SWEET project, and those deemed useful after a review of the massive GILS effort. For completeness, the RDF of GML itself  will be included, as a testament to its user "friendliness".

Federation
Early attention to RDF and underlying Jena support should make adherance to any resultant ontologies emerging from  GEOSS catalog effort straightforward.  Installed instances of the platform are intended to federate seamlessly, by supporting standard metadata protocols, registration in appropriate catalogs (ECHO, GCMD, UNEP.net, Geonetwork, ESA assets, etc) and being structured as regional content centers with delegated authority over a certain polygonal area of the Earth's surface.

Mapping and Image Processing
The foundation rendering component for spatial content will be the Minnesota Map Server using the Java version of its standard interface objects. Map customization will be performed according to a user's profile by creation of “on-the-fly” mapfiles. Vector content will be stored in the PostGIS extension to the PostgreSQL database. Raster content will be stored in lossless Jpeg2000 format using the Kakadu software suite. A thin-wire applet base class, that supports uploading ESRI Shapefiles, comma-separated value (CSV) point files, or user-drawn overlay geometry with attributes, will be provided based on either the ROSA applet or an equivalent. The GRASS i.* and r.* suite of programs will be used for server-side image processing and cell-based raster analysis services in conjunction with a controlling client-side applet, or data can be downloaded and used locally within several existing FAO desktop applications.

Another important component is support of field mapping and surveys using offline PDAs and smart-phones.  A rudimentary workflow to build a forms-based field survey unit, similar to ESRI's popular Go! Sync,  based on OpenSync to a user's shared workspace and supporting scripts, will be provided.

Market Module
Linking growers to markets is a critical element of rural Internet connectivity. Athough it is expected that this function will most likely be accomplished by pre-existing systems or major complementary efforts, a simple market bidding and transaction service based upon Jbpm will be developed to help "bootstrap" those areas where nothing else exists. A cocerted effort will be made to link with emerging ICT-based mechanisms such as Tradenet , and facilitate extensions to the World Bank's dgMarket module using jBPM workflows, with a  gateway to cellular networks using the JBoss  Mobicents framework.

Commercial Software
Platform development will place a strong emphasis on standard interfaces. In particular, persistent content stores will be able to accommodate ESRI's spatial data engine (SDE), or Oracle spatial cartridge by virtue of support within Minnesota Mapserver and JBoss. Raster image processing services should also be supportable by server-side ERDAS, PCI, ENVI, or ERMapper functionality, if suitably wrapped within the Java portlet infrastructure and overall model-view-controller (MVC) framework. As previously mentioned, Most of these utilize the Flex-LM licensen manager scheme, whose logs can be processed into billing records to support "software as a service" accounting.

Data

Baseline Provisioning
To realize “out of the box” value for local decision-makers, it is essential that relevant, timely, reliable, free data be available. Datasets identified to meet this criteria are the global MERIS and Landsat mosaics, real-time MODIS vegetation index, MODIS surface temperature, TRMM rainfall measurements, and SRTM products. Provisioning a user's area of interest will result in assembling a time-series of subsetted temperature, vegetation index, and historical Landsat imagery for that area in an offline, batch-oriented mode. These datasets were selected because they are capable of creating the minimal data set necessary to run the DSSAT CERES and CROPGRO models. An additional derivative product, bi-weekly vegetation index changes, has also been deemed of considerable interest. These will all be made available on-demand via a standard provisioning jBPM workflow. This baseline can be fully automated, since all NASA DAAC holdings are registered with ECHO along with associated access methods.

Beyond these free satellite sources, it will be essential to incorporate the scanned topographic holdings of major providers, such as East View Cartographic Landinfo, and OmniMaps, if truely useful coverage is the goal. As discussed in Value Chains, a critical part of this effort is to provide a sustainable accounting mechanism for these private assets to be offered online, and relief, aid, and development projects to fund, the ongoing scanning and maintanance of these private maps. Indeed, in many parts of the former Soviet Union and elsewhere, maintaining the mapping data is their agency's charter, and "giving it away" is utterly antithetical to their culture. Nothing will foster the flow of information from those who have valuable information to those who need it like an enforceable accounting system.

A simple tool for discovering and downloading data held at the portal, that uses OpenGIS wire protocols and propagates identity via IETF RFC 2617, will be developed from one of several open-source client efforts already underway in the OpenGIS community. User accounts created by a delegated authority will be given a disk quota, and provided an applet-based drawing tool for uploading ESRI shapefiles or freehand input of georeferenced, attributed polygons, points, or outlines. Users can upload field measurements, notes, digital pictures, etc about that point or area. Collections of points with associated scalar values can be used to create isobar contour surfaces.

Beyond remote sensing data, a concerted effort to aggregate, load, and link to relevant, useful free datasets will be ongoing. Obvious sources are FAO soils databases, UNEP.net, AQUISTAT, Digital Chart of the World, and Africover data. As previously mentioned, the mapping server will have the ability to act as both an OpenGIS client and server for WMS, WCS, and WFS protocols. Node managers can choose to download and cache datasets that either require higher performance or higher qualities of service than provided by remote access.

Additional Data
Any additional baseline data the site might acquire, such as aerial photography, etc. can be added to the user workspaces. Because the underlying Minnesota Map Server can utilize remote data sources using the OpenGIS Web Mapping protocol, users can also access remote data products from GlobeXplorer content partners, or other federates. Of particular interest will be free CBERS data throughout Africa and Brazil, NASA's OnEarth, as well as data from  the ESRI geography network. ESA's Vito-managed archive has also indicated a willingness to participate, and a few other Landsat 7 operators .

Customized jBPM workflow efforts will be encouraged to capture highly specific data, particularly localized meteorological observation capability.

Installed nodes might also become a focal point and distribution mechanism for content generated by members of the International Steering Committee on Global Mapping. An identified area for enhancement of the reference platform is integration with the emerging OpenGIS "SensorML" specification for in-situ measurements.

Finally, suitable market price information feeds will be explored.

Distributed Workflow
Because of the enormous volumes of several core raster datasets, data provisioning will necessarily be a carefully orchestrated process, designed to optimize overall CPU, bandwidth, and storage requirements. As such, as much "upstream" processing as possible will occur. For example, MODIS EVI and temperature tiles will have their desired HDF slices extracted, and then be mosaiced, reprojected, and losslessly compressed before transmission to a content node. This will reduce transmission and storage requirements by an order of magnitude. A batch provisioning server, with multiple T3 connections to the EROS data center, has been established specifically for this purpose. This may eventually be co-located at the UNEP/EROS facility.

The embedded workflow engine of the portal will enable modular addition of a wide variety of useful inputs, particularly local meteorological observations. Such data might be retrieved over the Internet, using an automated dial-out modem, or 3270 terminal emulation.

One other important workflow to be supported is remote order fulfillment to Geonetcast terminals. such as this installation in Rwanda. Currently, EUMETCAST and NOAA operate mostly regular meterological and coarse-resolution data (AVHRR, etc).  However, both support prioritized "push" multicasting with subsystems such as Kencast. Both operators have expressed willingness to  accept prioritized "push" images from this portal effort. Therefore, given a minimal VSAT 'backchannel', this portal can fulfill a vital role in the overall GEOSS conceptual workflow: aggregating and prioritizing user requests from Geonetcast stations. Indeed, subsetting and Geonetcast rebroadcast will be a critical component to the success of free CBERS data for Africa.

Subsetting
For higher resolution data, especially Landsat imagery, users will be provisioned with suitable subsetted areas instead of entire scenes. Subsetting capability will initially be invoked using secure shell, using the GDAL and subset.org suite of tools. These tools will be also be enhanced to support both OpenGIS WCS client and server capabilities, to complement the Minnesota Map Server's existing ability to act as both OpenGIS Web Mapping and Web Feature client and server. A GRID-based execution is currently being prototyped in conjunction with GlobeXplorer's GRID efforts and the LAITS OGC-GRID integration effort. Phase one of this effort would establish such subsetting capabilities within EROS data center, Goddard Space Flight Center, GlobeXplorer, and the Maryland Global Land Cover Facility.. This basic GRID subsetting is intended to be the first practical, open-source realization of the CEOS GRID effort, and establish technical protocol for interaction with the service elements of the European Union's Global Monitoring for Environment and Security program. Such cooperation may become increasingly important given the age of the MODIS, Landsat and TRMM platforms, and the current launch schedules of Hydros, NPOESS, and other American sensors.

All of these raster datasets will be stored in two forms: a false-color version suitable for web presentation, and a lossless version suitable for scientific simulation. Content management business processes and interactive applications will associate the two atomically in a reusable fashion. An accompanying applet will enable users to interactively "pick" a point on the false-color JPEG rendition in their browser, and retrieve the value of all bands of the multispectral image "backing" the JPEG. In this manner multispectral, multitemporal signatures of a wide variety of phenomena may be collected by users worldwide, easily associated with close-up digital camera images, field notes, or audio recordings, and shared with others.

Access Control
Security and privacy is essential towards gaining sufficient trust of users to upload data, and building community. For example, in humanitarian relief and food security efforts, it may not be useful to widely publish metadata about mortality statistics. Uploaded individual feature data will be tagged with user and group-level information at the table level for uploaded Shapefiles, and in two database columns in the shared table for interactively drawn points and polygons within PostGIS, viewable as "virtual" tables with appropriate SQL "WHERE" clause predicates. In this manner role-driven security and access-control may be maintained and enforced through all publishing, workflow, and rendering operations. Metadata about "virtual tables" will be periodically harvested from the shared tables.

Services
Workflows capable of remote transformations can be established
as Apache Axis SOAP web services agents, or OGSA GRID agents, the first being subsetting. Such transformation workflows would be initiated according to the ISO 19119 and the OpenGIS Services Architecture protocols, and discovered according to the rapidly evolving OpenGIS Web Registry Services. As previously mentioned, this effort will be closely aligned with GlobeXplorer's GRID efforts and the LAITS OGC-GRID integration effort,  and in general, along the lines of ebXML trading partner agrements and the OGC Web services testbed efforts.

Identity and Delegated Authority

Trust
User workspaces need to have suitable authentication and access control mechanisms in place. Data about one's farmland or watershed, particularly its yield capability or contaminants, is extremely sensitive information. It can often be tied directly to loan risk, land valuation, or environmental regulation noncompliance. Gaining users' trust that uploaded data will not be used against them is a very long, difficult process. This is precisely why the initial focus of the portal will be downloaded data for existing "fat" clients. The failed VantagePoint Network had a very difficult time convincing perspective customers that their data was not going to be used by the US EPA, farm credit agency, or anyone else. Clearly this will only be as robust as the practices of the hosting institution, but the overall architecture must support role-based identity management by delegated authorities, and access control of all uploaded data. This will be done by "tagging" all individual uploaded features with identity and access control information. Beyond trust, this is also essential if value exchange is to be promoted among users: information known to be held by one party must be deemed both scarce and useful by another.

Within a single instance of a portal, an overall site administrator will be assigned, who in turn can designate sub-administrators that can create groups and new individual users of their own. A unique feature of the spatial enhancements being made to the Alfresco platform will be its ability to delegate geographic areas to administrators.

Certificates
This effort establishes an X.509 chain of certificate authority with FAO, using Java Keytool. New sites will be issued a certificate, from which they may issue their own login/password credentials. In this manner, individual countries with sufficient technical capacity can either support an entire portal node of their own, or simply be delegated all authority for the chains of identity that fall within their polygonal spatial bounds. Nested delegates can also be established, that can in turn issue credentials, scoped to a group level. For example, the Ohio node will issue logins for the Midwest, and the University of Cinncinati may be a delegate of that node that issues its own credentials. As previously mentioned, groups may have spatial attributes, so that one can cleanly define a delegate group as a particular county or township. The overall framework used will be Java Authorization and Authentication (JAAS).

Authenticated Field Inspectors
Verifiable identity provides an extremely unique opportunity to build georeferenced ground truth data sets about agricultural practices by certifiable role players, particularly agricultural inspectors. Phase one of this effort prototypes a pilot program with Fairtrade Labeling International and the International Federation of Organic Agriculture Movements to highlight this capability, by issuing inspectors digital cameras and GPS units for uploading georeferenced information about coffee growers in the greater Caribbean Basin and Columbia. A methodology for systematically making this trusted information available to wholesalers and consumers to learn the certified practices that produced their coffee, and strategies to integrate the federated network with UN/CEFACT bill of lading location and function codes will be explored. Proper positioning to country customs protocols, and liaison to other FAO efforts such as "the Codex Alimentarius Commission" will be examined to utilize such captured field information by authenticated inspectors.

Applications, Distance Learning, and Outreach

Fat Clients
A large class of portal users will probably not have the ability or desire to stay connected to the Internet all day while they work. Rather, they will most likely occaisionally connect, get what they need, and work offline in desktop applications. This class of user will be given the choice to download a "bundle" of free, pre-packaged desktop applications and datasets, that can occasionally "synchronize" their their workspace with field measurements aquired with GPS-capable harvesters or other means. These tools will include Metalite, CGIAR's tools, various FAO applications, CROPWAT, CLIMWAT, DSSAT, SIMIS and other items deemed useful. The source code for all of these items will be placed in a centralized CVS repository, moderated by FAO and CGIAR. Other precision-agriculture software from commercial vendors, such are ArcGIS, FarmGIS, SST toolbox, will no doubt be in wide use as well. The bi-weekly updates of MODIS vegetation information, daily TRMM updates, and other specific newsgroups or discussion forums offer the basic incentive to participate in the overall portal community. The centralized content nodes will also be able to stream content on demand to OpenGIS capable clients. Existing "fat client" desktop tools will such as WinDisp will be individually evaluated for the level of effort to directly use OpenGIS WCS and WMS protocols versus downloaded files.

Financial calculations
In addition to SIMIS, a simplistic web-based spreadsheet will be provided to take outputs from applications and perform simple cash-flow calculations for fertilizer, irrigation, pest control, tillage , harvest and transport input costs versus crop transport price. This tool needs better definition about its utility, data input, and training requirements. Possible liaison with the FEWS "Priceman" effort, particularly its data feeds, could be beneficial. A simple tool to outline one's field, calculate the area, and determine potential costs and market values is a part of this simple calculator.

Image Processing and Spatial Analysis
Phase 2 of this effort develops powerful image processing and spatial analysis functionality so users can manipulate Gigabytes of multitemporal remote sensing imagery over modem connections, once their workspace has been provisioned. This will be generally be implemented with a collection of pre-packaged, html POST and applet-based web pages that run GRASS i.* and r.* programs on the server, against local files. One important application will be an online version of the FAO Land Cover Classification System (LCCS) tools and documentation, with a supporting Java applet-based drawing tool and dialog to run a pre-packaged, batch version of the GRASS i.cluster, i.group, i.class, i.gensig, i.smap, and i.maxlik programs. Assistance and partnership with GRASS national user groups will be established.

A concious effort will be made to enforce modularity of this type of capability, so that portal installations can choose to install commercial image processing application suites, such as ERDAS, PCI, ENVI, ERMapper, IDL or ION.

As previously mentioned, an accompanying applet will enable users to interactively "pick" a geographic point, and retrieve the value of all bands of one or many multispectral images at that point. In this manner multispectral, multitemporal signatures of a wide variety of phenomena may be collected and shared with associated ground truth data by users worldwide.

Shared Desktops and VOIP
Beyond traditional newsgroups, email, and metadata publishing, this project will support a bank of VNC servers for interactive distance learning, expert "hands-on" advice, and remote access to traditional "fat" desktop applications. Suitable configuration on a standard http port (80) will enable transparent access through most firewalls. Distance learning sessions will also be authored in Wink. Additionally, a workable Internet telephony conferencing framework will be established, either based upon mature open-source applications such as Asterisk, Skype, or in conjunction with an established VOIP carrier. In this manner extremely inexpensive voice conferencing may be part of the overall collaboration experience. It is recognized that this aspect of rural Internet connectivity - low cost voice over IP telephony services - is an entire industry unto itself, and will continue to spawn services such as CUWorld, Quicknet, and others. It is hoped that a baseline system might be established to make the adoption of sustainable agriculture and relief work more effective. In particular, the Alfresco document management system is being designed to interface directly to a VOIP interactive voice response (IVR) system, such that email, newsgroup, simple content searches and browsing can be accomplished using touch-tone telephones. The sample text-to-speech engine will be FreeTTS. Suitable, more powerful and multilingual engines, such as NaturalVoices or Sayco can be integrated as needed with the core content engines. Non-numeric input, specifically lat/long coordinates, can be achieved using recognition software. Again, a simple free English recognizer, Sphinx, is provided to demonstrate this capability.

Because VNC sessions execute at portal locations, they enable access to traditional GUI desktop applications, running on high-performance CPU power against content spinning on local disks. If a node operator chooses to purchase their own licenses, a generic "exec" portlet to spawn a new VNC session, with the authenticated user's credentials, can be used to run ArcGIS, or an other suitable desktop application within a Java-enabled browser applet, or a small Windows Active-X component. The "exec" portlet will log appropriate "call detail records" for reconciliation and billing purposes. For high volume centers supporting a large number of concurrent desktop and/or applet-based sessions, a site can be configured to use a pool of servers, where user sessions are provisioned on demand using GRID Engine.

Run-time Access Control
Because the VNC session will be executed with the credentials of the authenticated user, the overall session will only be granted access to content store files for which they appear on the file's access control list. In this manner, the terms of "shared buy" data programs, such as the Multi-Resolution Land Characteristics Consortium can be enforced, for sites that gain sufficient trust of data vendors. Another option is, given sufficient bandwidth, to use such data as a metered remote OpenGIS WCS from GlobeXplorer or others with CDR logging capability.

Distance Learning
Because a major goal of this infrastructure is to facilitate outreach and distance learning among rural communities, a strong emphasis will be placed on creating structured rendezvous processes, using calendars, searchable registries, and group messaging, to link appropriate domain specialists in agronomy, entomology, soil science, hydrology, and image processing with users throughout the federation. Beyond threaded email lists, interactive support via text "instant messaging", shared workspace, and voice conferencing will be encouraged. As previously mentioned, a strong effort to establish acceptance of standard voice-over-IP (VOIP) and shared desktop conferencing infrastructure will be made early on and encouraged for all new users. Additionally, because the VNC server enables multiple simultaneous input streams and remote display buffers, it is also an ideal tool for distance learning. Experts worldwide can interactively "take control" of a shared workspace to run training sessions or address site-specific issues. A directory and calendaring system, fully integrated with the Alfresco framework, will enable users to rendezvous with domain specialists in image processing, agronomy, entomology, soil science, and hydrology. As with all other items, call detail records will be generated for future reconciliation between stakeholders. There is even an Active-X component built around VNC (ExpertVNC) that dynamically installs a VNC server "on the fly", thereby enabling call-center experts to "take control" of a remote user's desktop and local files.

It is believed that VNC-based training, while never as effective as face-to-face interaction, will greatly facilitate outreach and acceptance of the overall network, and enable highly specialized experts worldwide to provide assistance during rapidly changing, critical situations during the growing season, or immediately following "shock" events such as hurricanes, or floods.

Simulation and Decision Support

Effective decision-making is greatly enhanced by accurate underlying models of system dynamics. In agriculture and epidemiology, this means effectively modeling crops, soils, water, and disease vectors subject to environmental conditions and management practices.

Therefore, a major goal of the effort is to support useful simulation tools. Temperature and rainfall baseline datasets will be pre-populated with MODIS and TRMM data, and augmented by uploaded data and local data feeds. Users can add extremely detailed site-specific information to calibrate their models using local knowledge, particularly yield maps. An online version of the ICASA/DSSAT models will be available in the user's workspace. For soils data, users will encouraged to use SDBm Plus.

Specifically targeted applications are:

Presentation of simulation results will be accomplished using Minnesota Map Server's GDAL capability.

Beyond these, a general-purpose dialog to build reusable libraries of GRASS r.mapcalc, r.cost, r.spread, and other commands for expressing arbitrary spatial processes will be explored, so important analysis like this, or this might be easily assembled and reused. Universities worldwide will be invited to engage in collaborative development of new application portlets in agriculture, hydrology, conservation, epidemiology, and other disciplines that can leverage SRTM, TRMM, temperature, Landsat, and user-provided datasets.

Integrated Processes
Finally, two highly-instrumented test sites will be selected to evaluate the comprehensive “FLORES” simulation environment. This portion of the effort will drive consideration of how to reuse detailed field measurements between three powerful but different environments: GRASS, DSSAT, and Simulistics. It will also provide extremely useful insight into how to approach landscape-level, reusable, modular model development and policy evaluation efforts.

One of the Simulistics sites will be within the Ohio Little Miami River Basin, one of the US EPA's National Water Quality Assessment verification. This site is desirable because it is extremely well instrumented, has corn, soybeans, and wheat crops, and access to free real-time Landsat data through the OhioView program.

Semantics

Taxonomic Support
Beyond the handful of simple categories enumerated in ISO/TC211 19115 B.5.27, a strong emphasis will be made on rigorous categorization of content, groups, and workflow items according to well-known taxonomies and controlled vocabularies, beginning with land cover data according to the FAO LCCS specification. It is felt that adequate investment early on is essential to address the specialized needs of the agricultural sciences community. Pre-existing vocabularies and taxonomies are abundant and extremely important to CGIAR and FAO member countries. An important standards group in this arena is the International Working Group on Taxonomic Databases (TDWG). In any case, however, local farmers will not simply "drop" their practices in favor of a universal system; indeed much knowledge is to be captured adequately structuring indigenous taxonomies. But beyond these, numerous "standard" vocabularies and taxonomies already exist: those of SDBm Plus, GCMD keywords, SDTS feature codes, NIMA place names, FGDC Biological Profile, etc. Standardization of taxonomies is in its infancy, and is actively being debated within the OpenGIS Web Registry group. Another important forum for this topic is the newly-formed Taxonomy and Semantics Working group of KM.gov, formed by US government Federal Chief Information Officers Council. Again, FAO, CGIAR and USAID are clearly in a leadership position to establish a set of reusable standard taxonomies, which would in and of itself be a major step towards knowledge management.

Tools
An Alfresco tool is being developed to capture and edit a taxonomy using a simplistic recursive, single-inheritance Java Bean structure and an associated category class browser. The captured taxonomies will be done in manner consistent with the “T-model” category mechanism of the Universal Description, Discovery and Integration (UDDI) specification. Nodes of this taxonomy will comprise content items themselves, and can contain multilingual descriptions, photographs, or audio files. Taxonomies will be maintained by members of a special group, in order to avoid "stupid user errors." Once a taxonomy is established, it can be searched using multi-lingual full-text queries, or interactively with an applet helper. Once a category is found, it can be used to "tag" other content items. Content can be categorized within multiple different taxonomies. Additionally, the body of individual .pdf, .doc, and .html, documents can be indexed for full-text queries as well. Future planned enhancements include the ability to "tag" individual sections of documents as well. Similarly, individual database columns of uploaded ESRI shapefiles, member groups or workflow items can also be taxonomically and spatially categorized. Content, interest groups, and events categorized as belonging to: "rice in this watershed" can be formulated and portrayed graphically. Extending this capability to RDF and the semantic web is a clearly identified area of collaborative research; see the Knowledge Structuring section of this document.

Collaborative Development and Research

Positioning
As illustrated throughout this document, the portal effort will be squarely based upon OpenGIS standards and numerous mainstream, cutting-edge open-sourced software efforts. As such, it directly benefits from ongoing, active development of the individual packages. World Bank, USAID, FAO and CGIAR's core competences are complementary: domain expertise, and field presence applying technologies to food security, poverty alleviation, and environmental management. This effort provides an exceptionally unique opportunity to test theory against the real-world, in concert with the majority of civilian ground stations and Earth Observation archives. Member development agencies need to assert these strengths and diplomatic status to influence ESA-sponsored GMES activities, Canadian GeoConnections, US NASA and NSF research, and other well-funded activities in this arena to address the crushing issues humanity faces in the next few decades.

University and NGO research
Beyond agriculture and food security, it is envisioned that this framework application development and knowledge capture, with structured business process definitions for scalable production of derivative spatial products based on taxonomic metadata, PKI security, value chaining, spatial content management, remote dataset discovery, and GRID-based process execution can offer universities, national agencies, and NGOs worldwide a wide yet focused spectrum of collaborative research opportunities. It can also enable FAO/CGIAR to tap external funding sources for major enhancements to monitoring and hypothesis evaluation of the GM crop activites, vital to feeding a hungry, growing human population. Ongoing effort will be made to facilitate multi-level aggregation of real data, whereby aggregates suitable for regional statistics and economic policy can draw directly upon the most accurate data available - that maintained by "pixel inhabitants."

The initial site locations are intentionally located in Africa and South America, where CBERS, Landsat, and other core data sets are free, already have a large overlap with UNDP, UNEP, and World Bank field operations, and can leverage FEWS activities. Strategic discussions should commence immediately as to how this may be positioned relative to their activities, for data collection, processing, and disseminationman. Also, NASA's Research, Education, and Applications Solutions Network (REASoN) are just beginning their work, and will no doubt be creating several subsystems, particularly thin-wire scientific applications, directly applicable to FAO and CGIAR's mission. Again, the project outlined in this document provides a coordinated conduit for that technology, as well as similar development efforts in Europe, India, China, and elsewhere, to reach and benefit UN member countries at the individual farm and watershed level.

Site Validation Efforts
As previously stated, it is recommended that a core strategy of the portal be to recruit and coordinate test agricultural sites among their field stations worldwide in order to present a "unified front" single to CEOS members for sensor validation, in conjunction with their existing Global Land Cover Test Sites. Additionally, well-established ecological monitoring sites, particularly TEMS sites, should be actively recruited to become early adopters of the framework.

Commercial Partnerships
Because the overall framework comprises a global comprehensive decision-support framework, with usage-tracking capability, a wide range of commercial partnerships are possible. First and foremost, partnerships with commercial operators such as GlobeXplorer, SPOT, Space Imaging, and others to enhance the data available for manipulation. Next, FAO and CGIAR should attempt to coordinate this effort with other lending institutions GIS activity, and particularly major funded GIS software system integration efforts with contractors such as Chemonics International, IBM Global Services, and others. Also, involvement with major equipment and fertilizer manufacturers early on for training and outreach.

Crop Models
Systematic Collaborations with ongoing CGIAR, USDA, FAO, ICASA, and other crop modeling efforts to share results and calibration data.

Farm-level Economic Models
Integration with wide range of extremely useful tools, such as Texas A&M's FLIPSIM

Knowledge Structuring
One identified area of research, consistent with the open model of the effort, is to employ taxonomic mechanisms such as the “KAON” semantic framework, to adequately structure and cross-reference knowledge about and between biological, soils-related, and other taxonomic groupings. This area represents a clear opportunity for university research worldwide, which might immediately “tap into” the CGIAR user-base worldwide. It is felt that the examinations of how to adequately categorize DSSAT - capable measurements within OpenGIS and TC/211 efforts, as well as perform interoperability between GRASS and FLORES, will quickly require such a robust framework. It is also felt that this is a long-term strategic necessity, particularly if CGIAR and FAO are to assist member countries to keep up with international biological taxonomy and eco-complexity efforts, to adequately describe their ecosystems in a time of rapid species extinction, climate change, introduction of GM organisms, and position FAO and CGIAR as leaders in major knowledge sharing efforts such as FIVIMS and the World Bank Knowledge Sharing program.

High Performance Computing
Another important area for collaboration is to develop a library of reusable cellular automata models for frameworks such as the parallel, open-source Spatial Modeling Environment, compiled and linked as formal, generalized modular models. Of particular interest to IFPRI and other policy institutions is using spatially explicit crop production estimates to drive economic-ecologic models and technology adoption models with household sub-models.

In this manner, models created by domain specialists in GRASS mapcalc, Simulistics model files , ArcGIS spatial analyst XML model files, or similar tools might be combined together, linked to social cellular automata models, and executed on D-GRID, Teragrid or DataTAG.

Incorporation of Earth System Modeling Framework coupling routines might enable multi-level aggregations between regional, high resolution models and coarse-grained, global models within the overall GTOS program.

Another area are mobile OGSA GRID agents, whereby certified algorithms and datasets can migrate to archives on demand. In this manner, knowledge domain experts might post specialized, X.509 "signed" models that can be assembled anywhere on demand within the federation to support derivative product creation in an optimal fashion.

Because security and accounting will be enforced throughout all aspects of user and group interactions, multi-level, distributed resource gaming might be accomplished. A scenario can be envisioned where localized variables are controlled by appropriate delegated sovereigns, "environmental" variables are set by lending institutions or CEOS member supercomputer runs, and a variety of possible scenarios are evaluated at several levels of hierarchy and linkage. This is precisely the goal of the Earth System Modeling Framework.

It is also conceivable to integrate with online socio-economic data, and conduct multiplayer "SimCity"-like games among stakeholders at multiple levels of hierarchy, with stochastic social or political models, for simulating food shortage, monetary crisis, or climate change scenarios caused by a wide variety of circumstances.