Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
pebau authored Dec 15, 2023
1 parent c8583fc commit 76ec5bf
Showing 1 changed file with 3 additions and 151 deletions.
154 changes: 3 additions & 151 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,9 @@
Data needed by the Use Cases will be imported into the FAIRiCUBE HUB to form datacubes. These make data access easier as they are homogenized in structure and access (if you ever had to go through different data repositories and harvest data, followed by own homogenization work, you will see that pushing that work "behind the curtain" by automating access is a big advantage). Of course, in order to provide data in such a "beautified" manner the homogenization work still needs to be done by somebody. In FC, that is: us. Tools like the data request form and the rasdaman ETL suite assist greatly, but for the final homogenized datacubes human caring hands are still necessary given the vast divergence of data; improving life of data wranglers is a key mission of the project, actually.

Contents:
[How to Get Data Added](#how-to-get-data-added)
| [Finding data](#finding-data)
| [Limitations](#limitations)
| [How-to](#how-to)
| [Use Cases](#use-cases)

- [How to Get Data Added](#how-to-get-data-added)
- [Further details](https://github.com/FAIRiCUBE/data-requests/wiki) like finding data, datacube access how-to, and use case specific modeling and access


## How to Get Data Added
Expand Down Expand Up @@ -42,150 +40,4 @@ In order to become visible to all, the new data set also needs to get its twin e

[Here](encoding-examples/dominant_leaf_type-metadata.xml) is an example of a metadata record compliant with the OGC coverage standard.

## Finding data

### Listing contents

Services support a direct listing, however not necessarily with the convenience of the planned catalog:

- rasdaman datacubes: [get list of datacubes](http://fairicube.rasdaman.com:8080/rasdaman/ows?&SERVICE=WCS&ACCEPTVERSIONS=2.0.1&REQUEST=GetCapabilities) (requires authentication) - beware: OGC-compliant XML document, search for "CoverageSummary"
- EOX: (tbd)

### Catalog

An easy way to browse datasets available is the catalog [catalog]((https://catalog.fairicube.eu/). Note that it is still under development and catching up with the datasets available.

## Limitations

- Time stamps have a peculiar mechanics on several datacubes which is not yet supported by rasdaman. Therefore, the time axis for now has ben modelled as an index (Cartesian) axis, meaning that temporal access (such as with the TIME parameter in WMS 1.3) is not yet possible. Full temporal support will become available still within 2023.
- Due to minor misalignments of the OGC standards some facets of the XML schemas do not validate. However, with most tools this is not an issue when using data.


## How-to

In this section we give a brief introduction to datacube wrangling. First, terminology: In standardization world, datacubes are modeled by "coverages". Most relevant are the OGC Coverage Implementation Schema (CIS) as the data model and Web Coverage Service (WCS) as the processing model, containing the Web Coverage Processing Service (WCPS) datacube analytics language. So don't be surprised to see "coverages" mentioned below.

We first present a general overview on standards-based datacube access, and then provide some use-case specific examples. If you want to see further examples added, [contact us](mailto:[email protected])!

### Coverages

Coverages are designed to be self-describing. While always more metadata can be added to some object, the coverage contains the essentials for understanding the pixels. The canonical structure of a coverage consists of

- domain set: where can I find values?
- range set: the values.
- range type: what do the values mean?
- metadata: what else should I know about these data?

Coverages can be encoded in a variety of data formats. Text formats include XML, JSON, and RDF; binary formats include GeoTIFF, NetCDF, and Grib2.

See [this tutorial](https://earthserver.eu/wcs/#cis) for more details on CIS and these [Fairicube encoding examples](https://github.com/FAIRiCUBE/data-requests/tree/main/encoding-examples).

#### Coverage Access

The Web Coverage Service (WCS), in its current version 2.1, defines access in a user-selected encoding, spatio-temporal subsetting, scaling, reprojection, as well as processing (see next section). Such Web requests are expressed as http GET or POST requests as this example (using fairicube rasdaman) shows (whitespace only for an easier read, not part of the request):

```
https://fairicube.rasdaman.com/rasdaman/ows
? SERVICE=WCS & VERSION=2.1.0 & REQUEST=GetCoverage
& SUBSET=date( "2018-05-22" )
& SUBSET=E( 332796 : 380817 )
& SUBSET=N( 6029000 : 6055000 )
& FORMAT=image/png
```

As per OGC syntax, date/time strings need to be quoted.

Note that http requires certain characters to be ["URL-encoded"](https://www.urlencoder.io/) before submission; browsers often do that automatically, but not programmatically generated requests.

See [this tutorial](https://earthserver.eu/wcs/#wcs) for more details on WCS.

### Coverage Processing

WCPS allows processing, aggregation, fusion, and more on datacubes with a high-level, easy-to-use language which does not require any programming skills like python. The following example inspects coverage A and returns a cutout with a range extent expressed in Easting and Northing (assuming this is the native coordinate reference system of the coverage) and a slice at a time point, returned in PNG format:

```
for $c in ( A )
return
encode( $c [ date( "2018-05-22" ), ( 332796 : 380817 ), N( 6029000 : 6055000 ) ], "png" )
```

Such a query can be sent through the WCS Processing request:

```
https://fairicube.rasdaman.com/rasdaman/ows
? SERVICE=WCS & VERSION=2.1.0 & REQUEST=ProcessCoverages
& QUERY=for $c in ( A ) return encode( $c [ date( "2018-05-22" ), ( 332796 : 380817 ), N( 6029000 : 6055000 ) ], "png" )
```

Again, be reminded that ["http URL-encoding"](https://www.urlencoder.io/) needs to be applied before sending.

So far, each coverage has been processed in isolation. Data fusion is possible through “nested loops”:

```
for $a in ( A ), $b in ( B )
return encode( $a + $b, "png" )
```

Aggregation plays an important role for reducing the amount of data transported to the client. With the common aggregation operators – in WCPS called “condensers” – queries like the following are possible (note that no format encoding is needed, numbers are returned in ASCII):

```
for $a in ( A )
return max( $a )
```

As a final example, the following WCPS query com¬putes the Inverted Red-Edge Chlorophyll Index (IRECI) on a selected space / time region, performs contrast reduction for visualization, and delivers the result reprojected to EPSG:4326:

```
for $c in (S2_L2A_32633_B07_60m),
$d in (S2_L2A_32633_B04_60m),
$e in (S2_L2A_32633_B05_60m),
$f in (S2_L2A_32633_B06_60m)
let $sub := [ date("2018-05-22"), E(332796:380817), N(6029000:6055000) ]
return
encode(
crsTransform(
( $c - $d ) / ( $e / $f ) [ $sub ],
{ E: " EPSG:4326", N: “EPSG:4326” }
) / 50,
"png"
)
```

See [this tutorial](https://earthserver.eu/wcs/#wcps) for more details on WCPS.

## Use Cases

### ML Use Case

tbd

### Drosophila Use Case

#### Genome Data

Corresponding data request issue: [Genomic data of Drosophila](https://github.com/FAIRiCUBE/data-requests/issues/86)

#### Occurrence Cube

Corresponding data request issue: [Distribution data of Drosophila](https://github.com/FAIRiCUBE/data-requests/issues/87).
[GBIF data](https://www.gbif.org/dataset/search?q=) are described in [this issue](issues/71). Our sister project B-Cubed, with GBIF as partner, will provide selected data.

Datacube Structure:

- Domain dimensions:
- Lat, Long:
- Extent (RD - EPSG:28992): Xmin: 168280, Xmax: 223880, Ymin: 512055, Ymax: 535555
- Extent (lat/lon): Xmin: 5.5831989141242966, Xmax: 6.4086515407429623, Ymin: 52.5917375949509562, Ymax: 52.8070852699905871
- Resolution: 10m
- Time (year): 2018
- Taxon: 7-digit taxon id, categorial
- Range type:
- Count: float, no-data: -1
- Maximum Uncertainty: float, no-data: -1
- Metadata:
- For the Taxon dimension, provide [Scientific Name table](https://github.com/FAIRiCUBE/data-requests/issues/71#issuecomment-1819084964)
- input format: [CSV](https://github.com/FAIRiCUBE/data-requests/blob/main/encoding-examples/datacube_nl_farmland_birds_1.csv) with columns Year, EEA Grid Cell, TaxonID, Count, Uncertainty
- EEA reference grid cell identifiers, e.g. 1kmE5432N4321 or 250mE1025N22000
- In contrast to the datasets to date, GBIF provides Lat/Long through a grid cell id for the LAEA 10m grid

### EOX-based use cases

0 comments on commit 76ec5bf

Please sign in to comment.