Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: Added csv file support in bulkinsert function. #34937

Closed
1 task done
OxalisCu opened this issue Jul 23, 2024 · 3 comments
Closed
1 task done

[Enhancement]: Added csv file support in bulkinsert function. #34937

OxalisCu opened this issue Jul 23, 2024 · 3 comments
Labels
kind/enhancement Issues or changes related to enhancement stale indicates no udpates for 30 days

Comments

@OxalisCu
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

What would you like to be added?

Added csv file support in bulkinsert function.

Supported csv file formats

The first row is the data column name, the other rows are the data.

id,vector,str,$meta
0,[0.4441574961032211,0.42256641558870456,0.5124223554286594,0.7879353699502213,0.10435004213789101,0.6558279547575208,0.8542535247754479,0.18052249999454162,0.74069758774751,0.671644191618406,0.8620709906438482,0.5662227553924041,0.6845673175571894,0.8208928281552716,0.33790201013426213,0.18988124161684228,0.10202593045231945,0.1471870143328713,0.06869036657075356,0.5667205215887225,0.04236414073008221,0.8178529098738148,0.8105205497340374,0.7281428237099189,0.8932990302967517,0.02308608910090093,0.024253476634125914,0.6599488157898903,0.5721126696281245,0.19410768611853624,0.24631872543995448,0.7741316741606677,0.0018173565224902655,0.5951492574858782,0.9691374700417137,0.3357226229891316,0.031663538177447825,0.35641259931933944,0.8736067240547148,0.04139773225935284,0.12395218107224193,0.02866541852379012,0.690290898786754,0.884545997673891,0.7344231162106207,0.12810988557970315,0.08022826215391399,0.09699656159742598,0.6212909014462635,0.9409293597668559,0.8870799278106343,0.028085676437840545,0.8449383864482414,0.07529139489859105,0.6109200226081098,0.46111659941659866,0.8658110876031009,0.20037064601758015,0.3884934920236158,0.18337220640377305,0.3691700204153576,0.8202018637691578,0.9519961581431066,0.2891754794365301,0.719687549653975,0.7272940090210113,0.30390801201853634,0.2042197204063806,0.5132942194764024,0.43122731533343417,0.8945010667901325,0.45358141178527234,0.9439794047018031,0.014903898906995394,0.23597943260784926,0.946019851119781,0.24806719468085037,0.3159444647200438,0.24359414698993342,0.43901485207095625,0.8170567499903985,0.35619565920679797,0.7476620967761007,0.6385496946324145,0.8927638716860876,0.08753812598963773,0.61370139398301,0.9420404629820935,0.994590597157377,0.9820016017948356,0.6724985481047637,0.7289636002556931,0.8249146433361916,0.35464364362053047,0.3949828019615689,0.33052377135094746,0.3233664890395268,0.46081756018300046,0.3118712727551809,0.43345876942126704,0.3260404986804897,0.3529123471375992,0.7869624822788952,0.0473537514957848,0.6924939796517883,0.30991948261866753,0.9400313016305064,0.4756319547556487,0.07311855830012803,0.1709968587763071,0.6539442413233122,0.4630207685933987,0.14449433708950843,0.7985017980894323,0.6119434479238345,0.8528449506299585,0.9872191753342655,0.6152727745089424,0.046736299188027686,0.65308906725064,0.7729670750428828,0.7284165026054096,0.48293564248840926,0.9854836455171545,0.9178022600841801,0.34696574369007016,0.690415889880481,0.28414483421159376,0.9540641293580162,0.38131104457956455,0.1261133062367651,0.49705891189096163,0.23964081640524293,0.7780418042146704,0.7595282538693462,0.20528444424644887,0.980206518869421,0.5324094512663566,0.6019158214301803,0.5843962794252021,0.07354875083081214,0.696304817359266,0.15190703084213497,0.9863108802521566,0.6697431841211515,0.5445909745917344,0.8384244890868814,0.6785698995114577,0.7811972964044696,0.2562246111210592,0.8447484399868579,0.17465208623055095,0.43638504046888393,0.4198172739401427,0.4608729527714356,0.489516366975732,0.05760413251313978,0.16366776959272833,0.3645407582249146,0.4851342148584654,0.047153363930932723,0.8321484223264899,0.7250264871308749,0.3538021544238088,0.7513584536426023,0.20162087026667952,0.60603916692737,0.48206036366740035,0.402005059516947,0.9435182638586305,0.6465970451390886,0.9477765928948979,0.874161781290922,0.5615931857169637,0.3338823056507928,0.6831644796219122,0.5623626590692059,0.14576048686740173,0.01833770451925676,0.9789987957047084,0.2640984777775175,0.39146742023984493,0.02451140765576021,0.08960448095232099,0.5518248646853536,0.5490268753154774,0.36050666399673315,0.2021752264608172,0.7335792916306215,0.6008753940997122,0.7414753362862627,0.6350992948990224,0.2922845818055545,0.9592943065377412,0.2804987481440011,0.8110273722834713,0.32943878998686815,0.363999624467101,0.9811899355542835,0.7589601796050752,0.1009985270192918,0.6864641027616184,0.26522432995842415,0.8825645135714185,0.8081121787269732,0.9444046580345751,0.551900848738695,0.9713582995421293,0.5065808701415008,0.368919531637659,0.40873209499359076,0.03969756276112768,0.4425816364891796,0.6848686290238956,0.19889958432959354,0.8687634773543643,0.6651718862665918,0.6356816934879368,0.8648589646746481,0.2677596530827896,0.36600819116226846,0.42033281608314377,0.36531170831327475,0.4197623403150389,0.07789319054464239,0.7420582979195541,0.6460624671550061,0.2877628544882993,0.4495814322566124,0.5938125706575114,0.8260910888171848,0.38914050137791245,0.5829392635008687,0.2813845694459911,0.6395813378050346,0.24811370548849432,0.3343160782943666,0.09725005585633895,0.4847360948511512,0.2687398926271285,0.14808403311184481,0.5231319432776479,0.6307116749771596,0.52471064607922,0.6253843738474663,0.7665601355498147,0.1428453184201377,0.6343369180814028,0.9823294873041695,0.07733228797115299,0.5755527788166522,0.922324338986731,0.9225850134856965,0.3790798552142043,0.953858991562216,0.38415458630228294,0.06181230678062344,0.6550394864500888,0.47611701932769723,0.5494443288265589,0.4324090423902712,0.3932805243694856,0.16864401113504202,0.7775540652635621,0.7887273450323801,0.9308429505357023,0.45018557136460013,0.4156048716088597,0.7182835290814121,0.9123931046834988,0.18235251314721512,0.9063149829103013,0.6990494170707794,0.39097521409527247,0.4193201885536534,0.49445483459325634,0.05740682437229272,0.6431827099083371,0.8166082366564943,0.5508343072899616,0.7256920246415933,0.18337398908784097,0.451231073641423,0.5972074723074453,0.7987003804138606,0.22660750487071735,0.2335616772249629,0.8686193878497064,0.31440938574190813,0.6693297852632178,0.6964603487965054,0.8659050767063006,0.8475111619360747,0.49365453150025906,0.07391437380271726,0.22116243626286547,0.7312508735917085,0.6027393423240437,0.35129462652974064,0.3465864119279116,0.3703542841041626,0.8094999747945035,0.06716333543270425,0.5851622460640429,0.14230709728676771,0.4269895995015133,0.4614655373713241,0.7824880489001339,0.7600142707265113,0.2963367257810231,0.8621822883863225,0.052359626363379874,0.6183454748338553,0.33900023179204875,0.5378010477518648,0.22541937206252038,0.9931039723425606,0.9675747243229571,0.9766791930154183,0.9173400349776913,0.8396929879522501,0.23356363312632777,0.2065464239202588,0.3779029317230984,0.9136294774803677,0.9439723417847732,0.5092716262142254,0.48504881092301944,0.13653056003327946,0.8183387497099125,0.0331588146709193,0.50095843674781,0.7806518912256648,0.6475843376945346,0.5897525020679655,0.29325534840563416,0.8008751718294244,0.4328256584162383,0.23450579802684068,0.9420568682950184,0.2488927900874871,0.9425558777545748,0.7642707539488459,0.5325969513740106,0.7717565481986975,0.7890902763932466,0.7103689099942037,0.5354685848625581,0.7367184049827455,0.1686925117947331,0.10861437302766952,0.9611273999573242,0.8747513343418708,0.48459814415981495,0.014070087412338728,0.02305889901290603,0.031135944324759635,0.34372533680443296,0.9040009230732432,0.4358489099388182,0.5644110802073957,0.6017542639335309,0.2534139640719454,0.3243219329654168,0.5393637662164339,0.5789166757050558,0.8096333835396873,0.4632541453869551,0.9185573634306591,0.858751013861756,0.2759179348771523,0.7905371905553881,0.7789511346816683,0.3839700193817629,0.8081763991174308,0.6891078405881509,0.6269261504331911,0.21780056336117948,0.6836391857232487,0.11825352894951346,0.33766309101219516,0.13568761604852353,0.4176167996752481,0.6361644820066347,0.5251611745703416,0.22981408176366025,0.7950511872825743,0.34704854352813286,0.600151789226796,0.27637726279511654,0.009408518745507188,0.3192174715313898,0.36600550621213,0.6773274846125051,0.6172847571590879,0.9311688605029549,0.27377885148175063,0.9460533057553854,0.2913624118199071,0.8204675872825563,0.7892562982153494,0.15392862307093247,0.15994599104395302,0.8101965518306578,0.700648512432918,0.12831757568104618,0.7629399033324894,0.1441525045437101,0.5441058239830393,0.6851268796856302,0.9758533122031383,0.1003414830506062,0.37017975197727915,0.2462880429675185,0.05978847261958431,0.27206949196450814,0.10673520905937184,0.306220680805266,0.9685321509423981,0.15570630720323053,0.4936351602082527,0.7151622796766401,0.1237543647233661,0.8473537074695925,0.24442084479144266,0.33607739579031426,0.018115643670830295,0.3699475615065264,0.37016379586188497,0.18656201632864755,0.806151371011321,0.7928699000282414,0.46513037650373223,0.329270027309166,0.041278772423602006,0.7438587204763184,0.3490947712260499,0.20990022220440185,0.2172263603749015,0.040371474533752894,0.9931209031902167,0.7505787050723491,0.4752504496548662,0.7245038479784707,0.14411305860879808,0.15365070658792612,0.18205715860560456,0.2966369670062392,0.04705249350855434,0.6103104342272561,0.7737733289911116,0.8050789452554619,0.9466124002999634,0.26506301046790604,0.9785577751820916,0.07498928516632541,0.6339060223263526,0.003450853078008076,0.3314065343325563,0.6746002663570336,0.042332057357039266,0.672359971064394,0.8979740160617136,0.4251330025744975,0.306930056746781,0.09897242917664417,0.4436124609406775,0.8267545056167179,0.9632095324479387,0.13041182211552116,0.9604230164238396,0.04345582265431236,0.8751435210737354,0.28500650544673733,0.38378064959598335,0.03465947042858175,0.25328939087379054,0.7736110291507389,0.7238091001580595,0.3833774701000907,0.31954266760720473,0.7377065889931448,0.33903482326043755,0.4848160564417746,0.9047841014938333,0.38052100271838474,0.4645025688652976,0.4279271473464069,0.17597086340504953,0.1410891495679616,0.7424754337992178,0.5610963378892787,0.7679804773374265,0.6105886490105265,0.8909148914699169,0.6568203409377192,0.3438603449066845,0.18643825260566782,0.0813386303758864,0.5123023073368081,0.6524875363393821,0.8436054539412252,0.567513440941363,0.8826869271952684,0.2997836945434066,0.32167460233456424,0.40263185496016896,0.6211802011894825,0.9833993792193437,0.28477909469513585,0.7771975996953422,0.30265656958952203,0.331795894405293,0.1649436192774585,0.8021750254966302,0.4384474163613892,0.7808239610839759,0.15215746047193557,0.4699088860113746,0.30704439786215587,0.2462592329875707,0.9586181350852072,0.5105969130163899,0.5056433255920798,0.6869193516029577,0.4944553083793344,0.6525191831138122,0.6850338014381768,0.736629004868637,0.8534453113976809,0.8703113338265951,0.930586588023661,0.7694532599284963,0.586141896669528,0.6546544257429624,0.8105895290772146,0.3439885543392259,0.5025130081349212,0.8900152694628864,0.285841864789273,0.6402695019585697,0.1107462537840661,0.9350730363617035,0.49389571610717997,0.27769209100116765,0.8703529278210244,0.12627004302056066,0.3605064590717658,0.9007904783017664,0.9779484751765228,0.8811145316730168,0.33301128788927326,0.9291180515339141,0.55728312822917,0.786446015876723,0.18266015145848213,0.9273020980121184,0.5825682621210736,0.9040225703058663,0.12129468874880234,0.0251451171342999,0.5643425513363478,0.911664871167829,0.06996308931171413,0.42566106277709115,0.6302924738500384,0.5257577177154867,0.4127013498419865,0.24694533165957167,0.5154985680164783,0.2007584686909426,0.3461609913390381,0.8486602106025913,0.12621390416132394,0.6666123980026819,0.5518440991166733,0.972460928596163,0.03534068489094533,0.22296584402377395,0.7966879201897958,0.8514918452945477,0.7226013088503149,0.8656950124911229,0.8780723559959276,0.03710341296578901,0.40414648347093396,0.5292558928416937,0.5228148180730385,0.13582594763746536,0.9987380206050711,0.7256529534358342,0.439589978631998,0.9595226994731351,0.7530751988890563,0.39947383618643983,0.02063209462873916,0.829087474939582,0.8351690148576493,0.8543417589757122,0.5856481643425053,0.6820065694636487,0.7381107042960315,0.4110136191032805,0.8064263941955222,0.2668393295570398,0.9392031440950421,0.9275206476355388,0.06894807343725595,0.5853660858983957,0.4107221760005356,0.6864660738010668,0.8171364709512116,0.4508336405901341,0.4409439220626159,0.4350159071481985,0.6275240395626774,0.15075028090293896,0.5290124388197572,0.2914507242209926,0.8630656282508112,0.8717291789840406,0.16300601947880833,0.6993589497719076,0.3385025919190727,0.6510522493507798,0.08458974795780871,0.21003362855068386,0.7113918666013439,0.8954808926267805,0.759770693725449,0.47408846368453195,0.06892499393055662,0.0423017164942755,0.2877514576847209,0.5069106805156417,0.4683075156571874,0.031967643961321235,0.9968715036195414,0.18450828620267568,0.3619285834360736,0.42278502409175756,0.9854969008460084,0.6214271764231925,0.057937248820826626,0.8388323412460185,0.5866798049451085,0.9616133212192609,0.04467029422036439,0.7936607960541312,0.6501827469904199,0.6848590393095434,0.4531283779400739,0.35862724298103543,0.2212691219591133,0.9076931445290384,0.06818173038724284,0.815381176158307,0.6311900879055865,0.0758934412231479,0.7995333248052461,0.9084981709473198,0.07092419786716841,0.6679894838115112,0.8986159514307533,0.5449804683235614,0.6228247793118808,0.7089627423645215,0.18943033308422208,0.8076254201170809,0.3910399593851953,0.4319348101767235,0.33665051363519194,0.3631949191992708,0.3182132663731686,0.06611383200422483,0.7436848565256281,0.8577171327141851,0.21272573889718716,0.9137847098752738,0.7028820200527182,0.3409415606478464,0.16439720470442198,0.6820424759016057,0.7258387836767015,0.15111237566046976,0.22359976271174398,0.09413382464019371,0.2848327681786349,0.15689537648548724,0.578618490819038,0.2153002867370637,0.04958987527626557,0.017375393619843527,0.01711757372770073,0.9707542157538412,0.0414909784850398,0.6233207232744715,0.9978677823075806,0.5687919471912469,0.70590892271362,0.09861104516191643,0.9019561910780584,0.3368451939367548,0.386686996234645,0.7938117853122615,0.4101345735005294,0.39885564324755973,0.4470375405518736,0.7361781600915861,0.8681478140323767,0.8294677299727603,0.5133741033188506,0.7975233514433697,0.7001492002009124,0.8737796949647152,0.4338266259529373,0.258057870901541,0.349315929581394,0.08431140469987242,0.334917823105384,0.4619812651968346,0.13998734579037264,0.23039801866196774,0.9656348048543126,0.8390172983201364,0.32897673032831254,0.9129130465904333,0.803331557480408,0.25814628292651876,0.9504426370025882,0.2594819229246007,0.00011259273957098248,0.2609411770679063,0.9270080789491945,0.9536592563678277,0.8591417584169584,0.7645035508207467,0.9549736676243905,0.7444879630755111,0.9846866708292493,0.11553974768215258,0.3747767096321074,0.9625139283323343,0.4871812037216665,0.17076308096966342,0.44738417078055737,0.7423817723570533,0.10822699642131739,0.28047501720412704,0.5815688514623982,0.7330930310318742,0.42386572336970163,0.4853615608987022,0.8237284796985186,0.2209285648303254,0.5804922337018175,0.196791120825625,0.08039874162254823,0.9918765804658749,0.993905036933092,0.4479369428038624,0.45296878845067423,0.9451242210182826,0.36791610327618973,0.9259472524372425,0.7430362015736331,0.33366454104082277,0.5844446741806159],i30xo2NGuA,"{""y"": 100}"

It is worth mentioning that vec, array, and json data types are stored in the form of json strings, especially vec and array types.

The vec and array types use json strings for storage and parsing, but do not support custom connectors, such as {1,2} or [1|2]. This is because special characters may appear in array[string], causing unexpected parsing errors in csv.

Migrate data from other databases to milvus

Import of csv data files exported from other databases (only tested postgresql with pg_vector) is supported. When exporting data from pg, its array type needs to be converted to a json string using the array_to_json method.

  • Convert array columns to json strings
CREATE VIEW my_view AS 
SELECT 
    BoolColumn, 
    IntColumn, 
    FloatColumn, 
    StringColumn, 
    JsonColumn, 
    array_to_json(ArrayColumn) AS ArrayColumn 
FROM my_table;
  • Export to a csv file
COPY my_view TO '/path/to/your/file.csv' WITH CSV HEADER;

CSV delimiter configuration

In addition, the delimiter of CSV supports configuration. The field name is sep. The delimiter only supports one unicode character. The RESTful API for creating an import task is as follows.

curl --request POST "http://localhost:19530/v2/vectordb/jobs/import/create" \
--header "Content-Type: application/json" \
--data-raw '{
    "files": [
        [
            "filepath"
        ]
    ],
    "collectionName": "collection_name",
    "options": {"sep": "\t"}
}'

Why is this needed?

No response

Anything else?

No response

@OxalisCu OxalisCu added the kind/enhancement Issues or changes related to enhancement label Jul 23, 2024
@bigsheeper
Copy link
Contributor

bigsheeper commented Jul 24, 2024

Perhaps we could implement a tool to convert CSV files into Parquet format and keep Milvus from supporting CSV import. This would help cut down on code maintenance costs. @tedxu @xiaofan-luan @OxalisCu

@xiaofan-luan
Copy link
Collaborator

@bigsheeper
I'm ok with support csv file, please help on reviewing it

sre-ci-robot pushed a commit that referenced this issue Aug 21, 2024
See this issue for details: #34937

---------

Signed-off-by: OxalisCu <[email protected]>
Copy link

stale bot commented Aug 24, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

@stale stale bot added the stale indicates no udpates for 30 days label Aug 24, 2024
@OxalisCu OxalisCu closed this as completed Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement Issues or changes related to enhancement stale indicates no udpates for 30 days
Projects
None yet
Development

No branches or pull requests

3 participants