-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add function delete from Deltabale where exist in dataframe + update Readme #66
base: main
Are you sure you want to change the base?
Conversation
…taframe Add function delete from another dataframe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! good effort here, please review my comments and use the scalafmt.conf
@@ -2,7 +2,8 @@ target/ | |||
lib_managed/ | |||
src_managed/ | |||
project/boot/ | |||
|
|||
project | |||
.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to ignore all the files/folders that start we period, because we need to keep versioning .github, .scalafmt.conf, etc. If you want to ignore a file that is not needed, please be specific.
) : Int = | ||
{ | ||
|
||
var returnValue : Int = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In functional programming we avoid the use of mutable values, please change this to be val
. You could simply return the value of numTargetRowsDeleted, if not rows were affected that should return 0.
.execute() | ||
|
||
returnValue = Integer.parseInt(targetTableDelta.history(1).select("operationMetrics.numTargetRowsDeleted").first().getAs[String]("numTargetRowsDeleted").toString()) | ||
return returnValue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't have to use the return keyword in scala, the last expression will be returned. So Integer.parseInt(targetTableDelta.history(1).select("operationMetrics.numTargetRowsDeleted").first().getAs[String]("numTargetRowsDeleted").toString())
alone will do the job.
@@ -729,4 +729,51 @@ class DeltaHelperSpec | |||
orderedComparison = false) | |||
} | |||
} | |||
describe("Remove rows from DeltaTable where rows exists in a Source Dataframe") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add unit test of failure scenarios as well, for example what would happen if the operator send is invalid or the column does not exists in the table, etc.
|
||
var returnValue : Int = 0 | ||
|
||
Try{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Try,Success,Failure
pattern should be used to return a Try
not a concrete type, for example
def deleteFromAnotherDataframe(
targetTableDelta: io.delta.tables.DeltaTable,
sourceDF: org.apache.spark.sql.DataFrame,
attrColNameOper: Map[String,String]) : Try[Int] = {
previous code...
previous code...
Try{
your code here....
}
}
And the users consuming your api are the ones that will perform the Sucess or failure pattern, like this
deleteFromAnotherDataframe(x,y,z) match {
case Success(i) => println("do success actions here.")
case Failure(s) => println("handle errors here.")
}
I would advice in this case that insted of Try,Success,Failure
use try,catch
and handle specific errors like invalid columns or invalid operators with a JodieValidationError message.
Add function delete from Deltabale where exist in dataframe + update Readme