Neo4j: what is the most efficient solution to import csv?












0















I have the following code and I need to use (if exists) a code more efficient because I have a lot of rows in my csv and Neo4j takes too much time to add all rows.



using periodic commit 1000
load csv with headers from "file:///registry_office.csv" as f
fieldterminator "|"
WITH f AS a
WHERE NOT a.JobName IS NULL and NOT a.JobCode IS NULL and NOT
a.JobDescription IS NULL and NOT a.JobLongDescription IS NULL
AND NOT a.Long_Description IS NULL AND NOT a.Position IS NULL
AND NOT a.birthDate IS NULL AND NOT a.startWorkingDate IS NULL
merge (b:Job{Name:a.JobName, Code:a.JobCode, Job:a.JobDescription,
JobLongDescription:a.JobLongDescription})
merge (c:Person{PersonName:a.PersonName, PersonSurname:a.PersonSurname,
CF:a.CF, birthDate:a.birthDate, address:a.address, age:a.age,
married:a.married, birthPlace:a.a.birthPlace})
merge (b)<-[:RELATED_TO{startWorkingDate:a.startWorkingDate,
JobPosition:a.Position}]-(c)
return *;


Do you have any suggestions for me?










share|improve this question

























  • have you created some constraints in your graph ? Can yougive the EXPLAIN of your query ?

    – logisima
    Nov 16 '18 at 13:28











  • Also include your neo4j version

    – Tezra
    Nov 16 '18 at 14:10











  • Yes, I created constraints and index and I have the last version of Neo4j.

    – raf
    Nov 16 '18 at 14:15











  • Please add EXPLAIN to the start of your cypher, and share the output.

    – Tezra
    Nov 16 '18 at 21:03
















0















I have the following code and I need to use (if exists) a code more efficient because I have a lot of rows in my csv and Neo4j takes too much time to add all rows.



using periodic commit 1000
load csv with headers from "file:///registry_office.csv" as f
fieldterminator "|"
WITH f AS a
WHERE NOT a.JobName IS NULL and NOT a.JobCode IS NULL and NOT
a.JobDescription IS NULL and NOT a.JobLongDescription IS NULL
AND NOT a.Long_Description IS NULL AND NOT a.Position IS NULL
AND NOT a.birthDate IS NULL AND NOT a.startWorkingDate IS NULL
merge (b:Job{Name:a.JobName, Code:a.JobCode, Job:a.JobDescription,
JobLongDescription:a.JobLongDescription})
merge (c:Person{PersonName:a.PersonName, PersonSurname:a.PersonSurname,
CF:a.CF, birthDate:a.birthDate, address:a.address, age:a.age,
married:a.married, birthPlace:a.a.birthPlace})
merge (b)<-[:RELATED_TO{startWorkingDate:a.startWorkingDate,
JobPosition:a.Position}]-(c)
return *;


Do you have any suggestions for me?










share|improve this question

























  • have you created some constraints in your graph ? Can yougive the EXPLAIN of your query ?

    – logisima
    Nov 16 '18 at 13:28











  • Also include your neo4j version

    – Tezra
    Nov 16 '18 at 14:10











  • Yes, I created constraints and index and I have the last version of Neo4j.

    – raf
    Nov 16 '18 at 14:15











  • Please add EXPLAIN to the start of your cypher, and share the output.

    – Tezra
    Nov 16 '18 at 21:03














0












0








0








I have the following code and I need to use (if exists) a code more efficient because I have a lot of rows in my csv and Neo4j takes too much time to add all rows.



using periodic commit 1000
load csv with headers from "file:///registry_office.csv" as f
fieldterminator "|"
WITH f AS a
WHERE NOT a.JobName IS NULL and NOT a.JobCode IS NULL and NOT
a.JobDescription IS NULL and NOT a.JobLongDescription IS NULL
AND NOT a.Long_Description IS NULL AND NOT a.Position IS NULL
AND NOT a.birthDate IS NULL AND NOT a.startWorkingDate IS NULL
merge (b:Job{Name:a.JobName, Code:a.JobCode, Job:a.JobDescription,
JobLongDescription:a.JobLongDescription})
merge (c:Person{PersonName:a.PersonName, PersonSurname:a.PersonSurname,
CF:a.CF, birthDate:a.birthDate, address:a.address, age:a.age,
married:a.married, birthPlace:a.a.birthPlace})
merge (b)<-[:RELATED_TO{startWorkingDate:a.startWorkingDate,
JobPosition:a.Position}]-(c)
return *;


Do you have any suggestions for me?










share|improve this question
















I have the following code and I need to use (if exists) a code more efficient because I have a lot of rows in my csv and Neo4j takes too much time to add all rows.



using periodic commit 1000
load csv with headers from "file:///registry_office.csv" as f
fieldterminator "|"
WITH f AS a
WHERE NOT a.JobName IS NULL and NOT a.JobCode IS NULL and NOT
a.JobDescription IS NULL and NOT a.JobLongDescription IS NULL
AND NOT a.Long_Description IS NULL AND NOT a.Position IS NULL
AND NOT a.birthDate IS NULL AND NOT a.startWorkingDate IS NULL
merge (b:Job{Name:a.JobName, Code:a.JobCode, Job:a.JobDescription,
JobLongDescription:a.JobLongDescription})
merge (c:Person{PersonName:a.PersonName, PersonSurname:a.PersonSurname,
CF:a.CF, birthDate:a.birthDate, address:a.address, age:a.age,
married:a.married, birthPlace:a.a.birthPlace})
merge (b)<-[:RELATED_TO{startWorkingDate:a.startWorkingDate,
JobPosition:a.Position}]-(c)
return *;


Do you have any suggestions for me?







csv graph neo4j cypher load






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 16 '18 at 10:07







raf

















asked Nov 16 '18 at 9:52









rafraf

446




446













  • have you created some constraints in your graph ? Can yougive the EXPLAIN of your query ?

    – logisima
    Nov 16 '18 at 13:28











  • Also include your neo4j version

    – Tezra
    Nov 16 '18 at 14:10











  • Yes, I created constraints and index and I have the last version of Neo4j.

    – raf
    Nov 16 '18 at 14:15











  • Please add EXPLAIN to the start of your cypher, and share the output.

    – Tezra
    Nov 16 '18 at 21:03



















  • have you created some constraints in your graph ? Can yougive the EXPLAIN of your query ?

    – logisima
    Nov 16 '18 at 13:28











  • Also include your neo4j version

    – Tezra
    Nov 16 '18 at 14:10











  • Yes, I created constraints and index and I have the last version of Neo4j.

    – raf
    Nov 16 '18 at 14:15











  • Please add EXPLAIN to the start of your cypher, and share the output.

    – Tezra
    Nov 16 '18 at 21:03

















have you created some constraints in your graph ? Can yougive the EXPLAIN of your query ?

– logisima
Nov 16 '18 at 13:28





have you created some constraints in your graph ? Can yougive the EXPLAIN of your query ?

– logisima
Nov 16 '18 at 13:28













Also include your neo4j version

– Tezra
Nov 16 '18 at 14:10





Also include your neo4j version

– Tezra
Nov 16 '18 at 14:10













Yes, I created constraints and index and I have the last version of Neo4j.

– raf
Nov 16 '18 at 14:15





Yes, I created constraints and index and I have the last version of Neo4j.

– raf
Nov 16 '18 at 14:15













Please add EXPLAIN to the start of your cypher, and share the output.

– Tezra
Nov 16 '18 at 21:03





Please add EXPLAIN to the start of your cypher, and share the output.

– Tezra
Nov 16 '18 at 21:03












1 Answer
1






active

oldest

votes


















1














The import tool is generally much faster than LOAD CSV.



However, your query suggests that each csv row ends as a pattern (b)<--(c), so you'd need to to some pre-processing on this csv... first filter null values, then split into 3 csvs (2 for nodes, 1 for relationships).



To do this, you have 3 main options:





  • Excel - not viable for huge CSVs


  • CLI tool - something like csvkit


  • Program - if you are OK with Python or JavaScript, you'll be able to do this in 20m or so.






share|improve this answer


























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53335295%2fneo4j-what-is-the-most-efficient-solution-to-import-csv%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    The import tool is generally much faster than LOAD CSV.



    However, your query suggests that each csv row ends as a pattern (b)<--(c), so you'd need to to some pre-processing on this csv... first filter null values, then split into 3 csvs (2 for nodes, 1 for relationships).



    To do this, you have 3 main options:





    • Excel - not viable for huge CSVs


    • CLI tool - something like csvkit


    • Program - if you are OK with Python or JavaScript, you'll be able to do this in 20m or so.






    share|improve this answer






























      1














      The import tool is generally much faster than LOAD CSV.



      However, your query suggests that each csv row ends as a pattern (b)<--(c), so you'd need to to some pre-processing on this csv... first filter null values, then split into 3 csvs (2 for nodes, 1 for relationships).



      To do this, you have 3 main options:





      • Excel - not viable for huge CSVs


      • CLI tool - something like csvkit


      • Program - if you are OK with Python or JavaScript, you'll be able to do this in 20m or so.






      share|improve this answer




























        1












        1








        1







        The import tool is generally much faster than LOAD CSV.



        However, your query suggests that each csv row ends as a pattern (b)<--(c), so you'd need to to some pre-processing on this csv... first filter null values, then split into 3 csvs (2 for nodes, 1 for relationships).



        To do this, you have 3 main options:





        • Excel - not viable for huge CSVs


        • CLI tool - something like csvkit


        • Program - if you are OK with Python or JavaScript, you'll be able to do this in 20m or so.






        share|improve this answer















        The import tool is generally much faster than LOAD CSV.



        However, your query suggests that each csv row ends as a pattern (b)<--(c), so you'd need to to some pre-processing on this csv... first filter null values, then split into 3 csvs (2 for nodes, 1 for relationships).



        To do this, you have 3 main options:





        • Excel - not viable for huge CSVs


        • CLI tool - something like csvkit


        • Program - if you are OK with Python or JavaScript, you'll be able to do this in 20m or so.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 17 '18 at 0:02

























        answered Nov 16 '18 at 23:47









        IzhakiIzhaki

        16.7k64977




        16.7k64977
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53335295%2fneo4j-what-is-the-most-efficient-solution-to-import-csv%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Xamarin.iOS Cant Deploy on Iphone

            Glorious Revolution

            Dulmage-Mendelsohn matrix decomposition in Python