Skip to content

Commit

Permalink
automate docs
Browse files Browse the repository at this point in the history
  • Loading branch information
andygrove committed May 2, 2024
1 parent 6d903c3 commit 4c24e41
Show file tree
Hide file tree
Showing 5 changed files with 200 additions and 15 deletions.
159 changes: 159 additions & 0 deletions docs/source/user-guide/compatibility.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Compatibility Guide

Comet aims to provide consistent results with the version of Apache Spark that is being used.

This guide offers information about areas of functionality where there are known differences.

## ANSI mode

Comet currently ignores ANSI mode in most cases, and therefore can produce different results than Spark. By default,
Comet will fall back to Spark if ANSI mode is enabled. To enable Comet to accelerate queries when ANSI mode is enabled,
specify `spark.comet.ansi.enabled=true` in the Spark configuration. Comet's ANSI support is experimental and should not
be used in production.

There is an [epic](https://github.com/apache/datafusion-comet/issues/313) where we are tracking the work to fully implement ANSI support.

## Cast

Cast operations in Comet fall into three levels of support:

- **Compatible**: The results match Apache Spark
- **Incompatible**: The results may match Apache Spark for some inputs, but there are known issues where some inputs
will result in incorrect results or exceptions. The query stage will fall back to Spark by default. Setting
`spark.comet.cast.allowIncompatible=true` will allow all incompatible casts to run natively in Comet, but this is not
recommended for production use.
- **Unsupported**: Comet does not provide a native version of this cast expression and the query stage will fall back to
Spark.

The following table shows the current cast operations supported by Comet. Any cast that does not appear in this
table (such as those involving complex types and timestamp_ntz, for example) are not supported by Comet.

| From Type | To Type | Compatible? | Notes |
|-|-|-|-|
| boolean | byte | Compatible | |
| boolean | short | Compatible | |
| boolean | integer | Compatible | |
| boolean | long | Compatible | |
| boolean | float | Compatible | |
| boolean | double | Compatible | |
| boolean | decimal | Unsupported | |
| boolean | string | Compatible | |
| boolean | timestamp | Unsupported | |
| byte | boolean | Compatible | |
| byte | short | Compatible | |
| byte | integer | Compatible | |
| byte | long | Compatible | |
| byte | float | Compatible | |
| byte | double | Compatible | |
| byte | decimal | Compatible | |
| byte | string | Compatible | |
| byte | binary | Unsupported | |
| byte | timestamp | Unsupported | |
| short | boolean | Compatible | |
| short | byte | Compatible | |
| short | integer | Compatible | |
| short | long | Compatible | |
| short | float | Compatible | |
| short | double | Compatible | |
| short | decimal | Compatible | |
| short | string | Compatible | |
| short | binary | Unsupported | |
| short | timestamp | Unsupported | |
| integer | boolean | Compatible | |
| integer | byte | Compatible | |
| integer | short | Compatible | |
| integer | long | Compatible | |
| integer | float | Compatible | |
| integer | double | Compatible | |
| integer | decimal | Compatible | |
| integer | string | Compatible | |
| integer | binary | Unsupported | |
| integer | timestamp | Unsupported | |
| long | boolean | Compatible | |
| long | byte | Compatible | |
| long | short | Compatible | |
| long | integer | Compatible | |
| long | float | Compatible | |
| long | double | Compatible | |
| long | decimal | Compatible | |
| long | string | Compatible | |
| long | binary | Unsupported | |
| long | timestamp | Unsupported | |
| float | boolean | Compatible | |
| float | byte | Unsupported | |
| float | short | Unsupported | |
| float | integer | Unsupported | |
| float | long | Unsupported | |
| float | double | Compatible | |
| float | decimal | Unsupported | |
| float | string | Incompatible | |
| float | timestamp | Unsupported | |
| double | boolean | Compatible | |
| double | byte | Unsupported | |
| double | short | Unsupported | |
| double | integer | Unsupported | |
| double | long | Unsupported | |
| double | float | Compatible | |
| double | decimal | Incompatible | |
| double | string | Incompatible | |
| double | timestamp | Unsupported | |
| decimal | boolean | Unsupported | |
| decimal | byte | Unsupported | |
| decimal | short | Unsupported | |
| decimal | integer | Unsupported | |
| decimal | long | Unsupported | |
| decimal | float | Compatible | |
| decimal | double | Compatible | |
| decimal | string | Unsupported | |
| decimal | timestamp | Unsupported | |
| string | boolean | Compatible | |
| string | byte | Compatible | |
| string | short | Compatible | |
| string | integer | Compatible | |
| string | long | Compatible | |
| string | float | Unsupported | |
| string | double | Unsupported | |
| string | decimal | Unsupported | |
| string | binary | Compatible | |
| string | date | Unsupported | |
| string | timestamp | Incompatible | Not all valid formats are supported |
| binary | string | Incompatible | |
| date | boolean | Unsupported | |
| date | byte | Unsupported | |
| date | short | Unsupported | |
| date | integer | Unsupported | |
| date | long | Unsupported | |
| date | float | Unsupported | |
| date | double | Unsupported | |
| date | decimal | Unsupported | |
| date | string | Compatible | |
| date | timestamp | Unsupported | |
| timestamp | boolean | Unsupported | |
| timestamp | byte | Unsupported | |
| timestamp | short | Unsupported | |
| timestamp | integer | Unsupported | |
| timestamp | long | Compatible | |
| timestamp | float | Unsupported | |
| timestamp | double | Unsupported | |
| timestamp | decimal | Unsupported | |
| timestamp | string | Compatible | |
| timestamp | date | Compatible | |
9 changes: 7 additions & 2 deletions spark/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,11 @@ under the License.
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-reflect</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
Expand Down Expand Up @@ -270,13 +275,13 @@ under the License.
<version>3.2.0</version>
<executions>
<execution>
<id>generate-config-docs</id>
<id>generate-user-guide-reference-docs</id>
<phase>package</phase>
<goals>
<goal>java</goal>
</goals>
<configuration>
<mainClass>org.apache.comet.CometConfGenerateDocs</mainClass>
<mainClass>org.apache.comet.GenerateDocs</mainClass>
<classpathScope>compile</classpathScope>
</configuration>
</execution>
Expand Down
41 changes: 31 additions & 10 deletions spark/src/main/scala/org/apache/comet/GenerateDocs.scala
Original file line number Diff line number Diff line change
@@ -1,11 +1,32 @@
package org.apache.comet
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

import org.apache.comet.expressions.{CometCast, Compatible, Incompatible, Unsupported}
import org.apache.spark.sql.catalyst.expressions.Cast
package org.apache.comet

import java.io.{BufferedOutputStream, FileOutputStream}

import scala.io.Source

import org.apache.spark.sql.catalyst.expressions.Cast

import org.apache.comet.expressions.{CometCast, Compatible, Incompatible, Unsupported}

/**
* Utility for generating markdown documentation from the configs.
*
Expand All @@ -15,10 +36,10 @@ object GenerateDocs {

def main(args: Array[String]): Unit = {
generateConfigReference()
generateCompatReference()
generateCompatibilityGuide()
}

def generateConfigReference(): Unit = {
private def generateConfigReference(): Unit = {
val templateFilename = "docs/source/user-guide/configs-template.md"
val outputFilename = "docs/source/user-guide/configs.md"
val w = new BufferedOutputStream(new FileOutputStream(outputFilename))
Expand All @@ -38,7 +59,7 @@ object GenerateDocs {
w.close()
}

def generateCompatReference(): Unit = {
private def generateCompatibilityGuide(): Unit = {
val templateFilename = "docs/source/user-guide/compatibility-template.md"
val outputFilename = "docs/source/user-guide/compatibility.md"
val w = new BufferedOutputStream(new FileOutputStream(outputFilename))
Expand All @@ -53,13 +74,13 @@ object GenerateDocs {
val toTypeName = toType.typeName.replace("(10,2)", "")
CometCast.isSupported(fromType, toType, None, "LEGACY") match {
case Compatible =>
w.write(s"| $fromTypeName | $toTypeName | Compatible | |".getBytes)
w.write(s"| $fromTypeName | $toTypeName | Compatible | |\n".getBytes)
case Incompatible(Some(reason)) =>
w.write(s"| $fromTypeName | $toTypeName | Incompatible | $reason |".getBytes)
w.write(s"| $fromTypeName | $toTypeName | Incompatible | $reason |\n".getBytes)
case Incompatible(None) =>
w.write(s"| $fromTypeName | $toTypeName | Incompatible | |.getBytes")
w.write(s"| $fromTypeName | $toTypeName | Incompatible | |\n".getBytes)
case Unsupported =>
w.write(s"| $fromTypeName | $toTypeName | Unsupported | |".getBytes)
w.write(s"| $fromTypeName | $toTypeName | Unsupported | |\n".getBytes)
}
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ object Unsupported extends SupportLevel

object CometCast {

val supportedTypes =
def supportedTypes: Seq[DataType] =
Seq(
DataTypes.BooleanType,
DataTypes.ByteType,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -576,7 +576,7 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde {
val value = cast.eval()
exprToProtoInternal(Literal(value, dataType), inputs)

case cast @ Cast(child, dt, timeZoneId, evalMode) =>
case Cast(child, dt, timeZoneId, evalMode) =>
val childExpr = exprToProtoInternal(child, inputs)
if (childExpr.isDefined) {
val evalModeStr = if (evalMode.isInstanceOf[Boolean]) {
Expand All @@ -590,7 +590,7 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde {
CometCast.isSupported(child.dataType, dt, timeZoneId, evalModeStr)

def getIncompatMessage(reason: Option[String]) =
s"Comet does not guarantee correct results for cast " +
"Comet does not guarantee correct results for cast " +
s"from ${child.dataType} to $dt " +
s"with timezone $timeZoneId and evalMode $evalModeStr" +
reason.map(str => s" ($str)").getOrElse("")
Expand Down

0 comments on commit 4c24e41

Please sign in to comment.