Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support libai DETR project #260

Open
wants to merge 146 commits into
base: main
Choose a base branch
from
Open

Support libai DETR project #260

wants to merge 146 commits into from

Conversation

HiHippie
Copy link
Contributor

@HiHippie HiHippie commented Apr 12, 2022

TODO LIST:

  • coco_dataset预处理
  • modeling
  • trainer
  • torch权重加载测试(已对齐)
  • eager global tensor parallel evaluation结果对齐
  • 更libai的transformer实现,目前版本参考很多torch.nn.MultiHeadAttention
  • 推进训练

oneflow bug和不支持算子记录

  1. oneflow min/max op 无法在不同数据类型间执行
  2. flow.cumsum tensor.cumsum
  3. nn.MultiHeadAttention
  4. flow.cdist
  5. flow.as_tensor从numpy array转换时无法显式指定data type
  6. flow.full_like
  7. for m in tensor: m[0]=False 并不会改变tensor数值
  8. tensor.copy_()不管用
  9. F.interpolate行为不一致
  10. tensor.split当split_size_or_sections=[x,0]的时候有bug
  11. flow.ByteStorage
  12. tensor.unbind在global tensor中 *** NotImplementedError

@HiHippie

This comment was marked as duplicate.

@rentainhe

This comment was marked as duplicate.

@HiHippie

This comment was marked as duplicate.

@rentainhe

This comment was marked as duplicate.

@HiHippie
Copy link
Contributor Author

from flowvisiontorchvision.models._utils import IntermediateLayerGetter
不支持

这个地方flowvision不支持吗,那这边我去flowvision下更新一下然后打个tag包吧

嗯不支持的~
哈哈行~我刚准备绕一下先

好像在这里支持了 https://github.com/Oneflow-Inc/vision/blob/main/flowvision/models/layer_getter.py, 应该是文件名没有对齐23333

收到~

@HiHippie
Copy link
Contributor Author

HiHippie commented Apr 14, 2022

oneflow min/max op 无法在不同数据类型间执行

>>> flow.__version__
'0.8.0.dev20220411+cu102'
>>> torch.__version__
'1.11.0+cu102'

最小复现代码
以float64和float32为例,其他不同类型间同理

torch

>>> import torch
>>> x = torch.randn(5, dtype=torch.float32)
>>> y = torch.randn(5, dtype=torch.float64)
>>> torch.max(x,y)
tensor([ 1.1421,  1.2252,  0.3676,  1.0047, -0.0242], dtype=torch.float64)
>>> torch.min(x,y)
tensor([-0.4623, -0.1920, -0.8689, -0.4471, -0.2798], dtype=torch.float64)

oneflow

>>> x = flow.randn(5, dtype=flow.float32)
>>> y = flow.randn(5, dtype=flow.float64)
>>> flow.max(x,y)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
oneflow._oneflow_internal.exception.Exception: 
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter_util.cpp", line 139, in Dispatch<oneflow::one::Tensor>
    Dispatch<TensorTuple>(op_expr, inputs, ctx)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter_util.cpp", line 131, in Dispatch<oneflow::one::TensorTuple>
    Dispatch(op_expr, inputs, outputs.get(), ctx)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter.cpp", line 96, in Apply
    internal_->Apply(op_expr, inputs, outputs, ctx)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/eager_mirrored_op_interpreter.cpp", line 139, in NaiveInterpret
    user_op_expr.InferPhysicalShapeAndDType( attrs, device_tag ... TensorMeta* { return output_tensor_metas->at(i); })
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_expr.cpp", line 445, in InferPhysicalShapeAndDType
    dtype_infer_fn_(&infer_ctx)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/infer_util.cpp", line 54, in UnchangedDataType
    Check failed: (tensor_desc.data_type()) == (first_tensor_desc->data_type()) (3 vs 2)

>>> flow.min(x,y)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
oneflow._oneflow_internal.exception.Exception: 
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter_util.cpp", line 139, in Dispatch<oneflow::one::Tensor>
    Dispatch<TensorTuple>(op_expr, inputs, ctx)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter_util.cpp", line 131, in Dispatch<oneflow::one::TensorTuple>
    Dispatch(op_expr, inputs, outputs.get(), ctx)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/op_interpreter.cpp", line 96, in Apply
    internal_->Apply(op_expr, inputs, outputs, ctx)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_interpreter/eager_mirrored_op_interpreter.cpp", line 139, in NaiveInterpret
    user_op_expr.InferPhysicalShapeAndDType( attrs, device_tag ... TensorMeta* { return output_tensor_metas->at(i); })
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/op_expr.cpp", line 445, in InferPhysicalShapeAndDType
    dtype_infer_fn_(&infer_ctx)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/framework/infer_util.cpp", line 54, in UnchangedDataType
    Check failed: (tensor_desc.data_type()) == (first_tensor_desc->data_type()) (3 vs 2)

已同步至https://github.com/Oneflow-Inc/OneTeam/issues/1207

@HiHippie
Copy link
Contributor Author

HiHippie commented Apr 20, 2022

flow.cumsum支持,tensor.cumsum不支持

>>> flow.__version__
'0.8.0.dev20220411+cu102'
>>> torch.__version__
'1.11.0+cu102'
>>> x = flow.randn(10,10,10)
>>> y = flow.cumsum(x,1)
>>> y = x.cumsum(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'oneflow._oneflow_internal.Tensor' object has no attribute 'cumsum'
>>> x = torch.randn(10,10,10)
>>> y = torch.cumsum(x,1)
>>> y = x.cumsum(1)

无法指定dtype

>>> x = flow.randn(5,5)
>>> flow.cumsum(x,dim=0)
tensor([[ 0.0508,  1.0346, -0.7175, -0.2991,  0.7678],
        [ 0.4012,  2.2157, -1.1069,  0.7856,  2.3732],
        [-0.6691,  1.7376, -0.2673,  0.8270,  2.3241],
        [ 0.6488,  2.2601, -1.5217,  1.0009,  2.4177],
        [ 1.0917,  1.9483, -1.0218, -0.4837,  3.5062]], dtype=oneflow.float32)
>>> flow.cumsum(x,dim=0,dtype=flow.float32)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
oneflow._oneflow_internal.exception.Exception: 
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/api/python/functional/py_function.cpp", line 40, in ReportKwargsError
    TypeError: cumsum(): got multiple values for argument 'dim'

@yuanms2
Copy link

yuanms2 commented Aug 24, 2022

子秋注意跟踪一下,你在detr中反馈出来的问题是不是被修复了

@HiHippie
Copy link
Contributor Author

子秋注意跟踪一下,你在detr中反馈出来的问题是不是被修复了

好的 袁老师

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants