Dear tic member,
Please see the report from SIG MindSpore Data.
# MindSpore Data Special Interest Group (SIG)
This is the working repo for the Data special interest group (SIG). This repo contains all
the artifacts, materials, meeting notes and proposals regarding dataset - data processing
and mindrecord - data format in MindSpore. Feedbacks and contributions are welcome.
Data Processing: You can understand it as a Dataset, which is mainly responsible for
reading the user's data into a Dataset, then performing related data enhancement
operations (such as: resize, onehot, rotate, shuffle, batch ...), and finally provide the
Dataset to the training process.
Data Format: It can conveniently normalize the user's training data to a unified
format (MindRecord). The specific operation steps are as follows: The user can easily
convert the training data into MindRecrod data by defining the training data schema and
calling the Python API interface. The format is then read into a Dataset through
MindDataset and provided to the training process.
# SIG Leads
Liu Cunwei (Huawei)
SIG leads will drive the meeting.
Meeting annoucement will be posted on our gitee channel:
Feedbacks and topic requests are welcome by all.
Slack channel https://app.slack.com/client/TUKCY4QDR/C010RPN6QNP?cdn_fallback=2
Documents and artifacts: https://gitee.com/mindspore/community/tree/master/sigs/data
# Meeting notes
Thursday April 2, 2020
# Current Progress
* Support multi-process of GeneratorDataset/PyFunc for high performance
* Support variable batchsize
* Support new Dataset operators, such as filter,skip,take,TextLineDataset