Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.4 bugfixes #159

Merged
merged 19 commits into from
Jul 15, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
d92ddb3
Remove an unused test, add 1 failing test for #157
korenmiklos Jul 11, 2024
90a740b
False alarm, was just using `split` wrong. wontfix #157
korenmiklos Jul 11, 2024
3926fd6
Check in 3 tests for calling functions in other modules
korenmiklos Jul 12, 2024
9676aa7
Introspect function methods in Main scope (#133)
korenmiklos Jul 12, 2024
af906a5
Don't even try evaluating in global scope, only in Main (fix #133)
korenmiklos Jul 12, 2024
b9453c0
Introduce tests for type promotion (#148)
korenmiklos Jul 12, 2024
5462909
Use `promote_type` for type promotion when replacing a variable
korenmiklos Jul 12, 2024
f04cfd2
Introduce tests for #158, 1 fails
korenmiklos Jul 12, 2024
0d600b9
Create _n and _N only once (#158)
korenmiklos Jul 12, 2024
78f4925
Additional tests to check _n and _N behavior when both if and by are …
korenmiklos Jul 12, 2024
3535ed3
refactor: use Main.eval(), not eval()
korenmiklos Jul 13, 2024
1727f42
Merge pull request #160 from codedthinking/Main.eval
korenmiklos Jul 13, 2024
a59ce49
Merge remote-tracking branch 'origin/0.4-bugfixes' into 0.4-bugfixes
korenmiklos Jul 13, 2024
85dbb94
Add tests to modular function vectorization
korenmiklos Jul 13, 2024
b7d3305
Push _n and _N to df before `@if` but after `by`
korenmiklos Jul 13, 2024
abfb65d
Add tests for helper functions
korenmiklos Jul 13, 2024
c9e2f4a
x.y is not a valid variable reference
korenmiklos Jul 13, 2024
0df943e
All known bugs fixed, bump version
korenmiklos Jul 13, 2024
21e3303
Remove unused code and add tests to improve code coverage
korenmiklos Jul 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 3 additions & 6 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,20 +86,17 @@ See the benchmarking code for [Stata](https://github.com/codedthinking/Kezdi.jl/
The function can operate on individual elements,
```julia
get_make(text) = split(text, " ")[1]
@generate Make = Main.get_make(Model)
@generate Make = get_make(Model)
```
or on the entire column:
```julia
function geometric_mean(x::AbstractVector)
function geometric_mean(x::Vector)
n = length(x)
return exp(sum(log.(x)) / n)
end
@collapse geom_NPG = Main.geometric_mean(MPG), by(Cylinders)
@collapse geom_NPG = geometric_mean(MPG), by(Cylinders)
```

!!! tip "Note: `Main.` prefix"
If you define a function in your own code, you need to prefix the function name with `Main.` to use it in other commands. To make use of [Automatic vectorization](@ref), make sure to give the function a vector argument type.

## Commands

### Setting and inspecting the global DataFrame
Expand Down
64 changes: 64 additions & 0 deletions goals.md
Original file line number Diff line number Diff line change
Expand Up @@ -266,3 +266,67 @@ An in-place version of `@with!` should do everything in place. This can mean all
- non-standard evaluation makes it hard to wrap Kezdi.jl code in functions
6. For loops
- implement `scalars()` and automatic expansion of locals in context

# 2024-07-12 In-flight debugging session
```julia
julia> using Kezdi

julia> module MyModule
myfunc(x) = 2x
end
Main.MyModule

julia> df = DataFrame(x = 1:10)
10×1 DataFrame
Row │ x
│ Int64
─────┼───────
1 │ 1
2 │ 2
3 │ 3
4 │ 4
5 │ 5
6 │ 6
7 │ 7
8 │ 8
9 │ 9
10 │ 10

julia> @with df @generate y = MyModule.myfunc(x)
10×2 DataFrame
Row │ x y
│ Int64 Int64
─────┼──────────────
1 │ 1 2
2 │ 2 4
3 │ 3 6
4 │ 4 8
5 │ 5 10
6 │ 6 12
7 │ 7 14
8 │ 8 16
9 │ 9 18
10 │ 10 20
```

How about aggreator function?

```julia
julia> module MyModule
myfunc(x) = 2x
myaggreg(v::Vector) = sum(x.^2)
end
WARNING: replacing module MyModule.
Main.MyModule

julia> @with df @egen y = MyModule.myaggreg(x)
┌ Warning: transform!(var"##237", [:x] => (((x,)->(passmissing(MyModule.myaggreg)).(x)) => $(QuoteNode("y"))))
└ @ Kezdi ~/Tresorit/Mac/code/julia/Kezdi.jl/src/commands.jl:100
ERROR: MethodError: no method matching myaggreg(::Int64)

Closest candidates are:
myaggreg(::Vector)
@ Main.MyModule REPL[8]:3
```

This means it was vectorized at compile time, but it is found at runtime.
30 changes: 9 additions & 21 deletions src/codegen.jl
Original file line number Diff line number Diff line change
Expand Up @@ -244,9 +244,10 @@
end
end

get_dot_parts(ex::Symbol) = [ex]

Check warning on line 247 in src/codegen.jl

View check run for this annotation

Codecov / codecov/patch

src/codegen.jl#L247

Added line #L247 was not covered by tests
function get_dot_parts(ex::Expr)
is_dot_reference(ex) || error("Expected a dot reference, got $ex")
parts = []
parts = Symbol[]

Check warning on line 250 in src/codegen.jl

View check run for this annotation

Codecov / codecov/patch

src/codegen.jl#L250

Added line #L250 was not covered by tests
while is_dot_reference(ex)
push!(parts, ex.args[2].value)
ex = ex.args[1]
Expand Down Expand Up @@ -274,28 +275,15 @@
isalphanumeric(str::AbstractString) = all(isalphanumeric, str)

isassignment(expr::Any) = expr isa Expr && expr.head == :(=) && length(expr.args) == 2
function operates_on_vector(expr::Any)
try
length(methodswith(Vector, eval(expr); supertypes=true)) > 0
catch e
if isa(e, UndefVarError)
return false
else
rethrow(e)
end
end
end
operates_on_missing(expr::Any) = (expr isa Symbol && expr == :ismissing) || operates_on_type(expr, Missing)
operates_on_vector(expr::Any) = operates_on_type(expr, Vector)

function operates_on_missing(expr::Any)
expr isa Symbol && expr == :ismissing && return true
function operates_on_type(expr::Any, T::Type)
try
length(methodswith(Missing, eval(expr); supertypes=true)) > 0
catch e
if isa(e, UndefVarError)
return false
else
rethrow(e)
end
return length(methodswith(T, Main.eval(expr); supertypes=true)) > 0
catch ee
!isa(ee, UndefVarError) && rethrow(ee)
return false

Check warning on line 286 in src/codegen.jl

View check run for this annotation

Codecov / codecov/patch

src/codegen.jl#L285-L286

Added lines #L285 - L286 were not covered by tests
end
end

Expand Down
7 changes: 4 additions & 3 deletions src/commands.jl
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,11 @@ function rewrite(::Val{:replace}, command::Command)
else
$setup
eltype_RHS = $RHS isa AbstractVector ? eltype($RHS) : typeof($RHS)
if eltype_RHS != eltype($target_df[!, $target_column])
local $third_vector = Vector{eltype_RHS}(undef, nrow($local_copy))
eltype_LHS = eltype($local_copy[.!$bitmask, $target_column])
if eltype_RHS != eltype_LHS
local $third_vector = Vector{promote_type(eltype_LHS, eltype_RHS)}(undef, nrow($local_copy))
$third_vector[$bitmask] .= $RHS
$third_vector[.!$bitmask] .= $local_copy[!, $target_column][.!$bitmask]
$third_vector[.!$bitmask] .= $local_copy[.!$bitmask, $target_column]
$local_copy[!, $target_column] = $third_vector
else
$target_df[!, $target_column] .= $RHS
Expand Down
13 changes: 13 additions & 0 deletions test/codegen.jl
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
module MyModule
myfunc(x) = 2x
myaggreg(v::Vector) = sum(x.^2)
mymiss(::Missing) = missing
mymiss(x) = 3x
end

@testset "Replace variable references" begin
@test_expr replace_variable_references(:(x + y + f(z) - g.(x))) == :(:x + :y + f(:z) - g.(:x))
@test_expr replace_variable_references(:(f(x, <=))) == :(f(:x, <=))
Expand Down Expand Up @@ -39,4 +46,10 @@ end
@testset "Unknown functions are passed through `passmissing`" begin
@test_expr vectorize_function_calls(:(y = Dates.year(x))) == :(y = (passmissing(Dates.year)).(x))
end
@testset "Functions in other modules" begin
using .MyModule
@test vectorize_function_calls(:(MyModule.myfunc(x))) == :((passmissing(MyModule.myfunc)).(x))
@test vectorize_function_calls(:(MyModule.myaggreg(x))) == :(MyModule.myaggreg(keep_only_values(x)))
@test vectorize_function_calls(:(MyModule.mymiss(x))) == :(MyModule.mymiss.(x))
end
end
20 changes: 16 additions & 4 deletions test/commands.jl
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,9 @@
df = DataFrame(x=[[1, 2], [3, 4], [5, 6], [7, 8]])
@test (@with df @generate x1 = getindex(x, 1)).x1 == [1, 3, 5, 7]
@test (@with df @generate x2 = getindex(x, 2)).x2 == [2, 4, 6, 8]
end

@testset "Error handling" begin
@test_throws Exception @with df @generate x = 1
df = DataFrame(text = ["a,b", "c,d,e", "f"])
df2 = @with df @generate n_terms = length.(split.(text, ","))
@test df2.n_terms == [2, 3, 1]
end
end

Expand Down Expand Up @@ -87,6 +86,19 @@ end
@test eltype(df.x) == eltype(df3.x)
end

@testset "Mixed types" begin
df = DataFrame(x=[1, 2, 3])
@test eltype((@with df @replace x = 1.1 @if _n == 1).x) <: AbstractFloat
@test eltype((@with df @replace x = missing @if _n == 1).x) == Union{Missing, Int}
@test eltype((@with df @replace x = "a" @if _n == 1).x) == Any
df = DataFrame(x=[missing, 2, 3])
@test eltype((@with df @replace x = 1 @if _n == 1).x) == Union{Int, Missing}
df = DataFrame(x=[1.1, 2, 3])
@test eltype((@with df @replace x = 1 @if _n == 1).x) <: AbstractFloat
df = DataFrame(x=[1, 2, missing])
@test eltype((@with df @replace x = 1.1 @if _n == 1).x) <: Union{T, Missing} where T <: AbstractFloat
end

@testset "Error handling" begin
@test_throws Exception @with df @replace y = 1
end
Expand Down