Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function delete from Deltabale where exist in dataframe + update Readme #66

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ilyasse05
Copy link

Add function delete from Deltabale where exist in dataframe + update Readme

mohamed.samaoul-ext and others added 3 commits March 12, 2023 14:40
Copy link
Collaborator

@brayanjuls brayanjuls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! good effort here, please review my comments and use the scalafmt.conf

@@ -2,7 +2,8 @@ target/
lib_managed/
src_managed/
project/boot/

project
.*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to ignore all the files/folders that start we period, because we need to keep versioning .github, .scalafmt.conf, etc. If you want to ignore a file that is not needed, please be specific.

) : Int =
{

var returnValue : Int = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In functional programming we avoid the use of mutable values, please change this to be val. You could simply return the value of numTargetRowsDeleted, if not rows were affected that should return 0.

.execute()

returnValue = Integer.parseInt(targetTableDelta.history(1).select("operationMetrics.numTargetRowsDeleted").first().getAs[String]("numTargetRowsDeleted").toString())
return returnValue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have to use the return keyword in scala, the last expression will be returned. So Integer.parseInt(targetTableDelta.history(1).select("operationMetrics.numTargetRowsDeleted").first().getAs[String]("numTargetRowsDeleted").toString()) alone will do the job.

@@ -729,4 +729,51 @@ class DeltaHelperSpec
orderedComparison = false)
}
}
describe("Remove rows from DeltaTable where rows exists in a Source Dataframe") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add unit test of failure scenarios as well, for example what would happen if the operator send is invalid or the column does not exists in the table, etc.


var returnValue : Int = 0

Try{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Try,Success,Failure pattern should be used to return a Try not a concrete type, for example

def deleteFromAnotherDataframe(
        targetTableDelta: io.delta.tables.DeltaTable,
        sourceDF: org.apache.spark.sql.DataFrame,
        attrColNameOper: Map[String,String]) : Try[Int] = {
        previous code...
        previous code...
       Try{
           your code here....
       }
}

And the users consuming your api are the ones that will perform the Sucess or failure pattern, like this

deleteFromAnotherDataframe(x,y,z) match {
    case Success(i) => println("do success actions here.")
    case Failure(s) => println("handle errors here.")
}

I would advice in this case that insted of Try,Success,Failure use try,catch and handle specific errors like invalid columns or invalid operators with a JodieValidationError message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants