Python: combining rows of 2 csv files on multiple conditions [on hold]











up vote
-2
down vote

favorite












I want to combine 2 CSV files with a different number of columns and each having unique ones too. Both files are thousands of lines long.
the files look like this:



file1:



Sample Chromosome Name Total coverage Strand coverage Context coverage



0 x3ng 7 CpG 2 1 1

1 x3ng 7 CpG 2 1 1

2 x3ng 7 CpG 1 1 1

3 x3ng 7 CpG 1 1 1

4 x3ng 7 CpG 1 1 1



Methylated coverage Methylation level position complement

0 1 1 207605 non

1 1 1 207616 non

2 1 1 210880 non

3 1 1 218029 non

4 1 1 218064 non



file2:



Chromosome Gene ENSEMBL start end complement

0 7 AC093627.11 ENSG00000239715 174920 176013 non

1 7 FAM20C ENSG00000177706 192969 300711 non

2 11 RP11-359E10.1 ENSG00000270607 19732480 19734165 complement

3 11 MIR4694 ENSG00000264309 19781550 19781629 complement

4 11 NAV2-AS3 ENSG00000254542 20000245 20002883 complement

5 11 NAV2-AS2 ENSG00000254453 20065392 20070849 complement



and the result should look like this:



Sample Chromosome Name Total coverage Strand coverage Context coverage Methylated coverage Methylation level position complement Gene ENSEMBL start end

x3ng 7 CpG 2 1 1 1 1 207605 non FAM20C ENSG00000177706 192969 300711

x3ng 7 CpG 2 1 1 1 1 207616 non FAM20C ENSG00000177707 192969 300711

x3ng 7 CpG 1 1 1 1 1 210880 non FAM20C ENSG00000177708 192969 300711

x3ng 7 CpG 1 1 1 1 1 218029 non FAM20C ENSG00000177709 192969 300711

x3ng 7 CpG 1 1 1 1 1 218064 non FAM20C ENSG00000177710 192969 300711



as you can see, the conditions are:




  1. 'Chromosome' in file1 = 'Chromosome' in file2


  2. 'complement' in file1 = 'complement' in file2


  3. 'position' of file 1 between 'start' and 'end' of file2



Thank you in advance.










share|improve this question









New contributor




P.Stegen is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











put on hold as unclear what you're asking by timgeb, legoscia, greg-449, Robert Columbia, Rob yesterday


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.











  • 1




    Please be clearer about what the exact problem here is. Do we have to start with how to read in a csv file? If yes, the question is too broad.
    – timgeb
    yesterday










  • No i know that, i just don't know how to work with multiple conditions.
    – P.Stegen
    yesterday










  • Good, then please provide a Minimal, Complete, and Verifiable example and your own attempt.
    – timgeb
    yesterday












  • @P.Stegen, are you saying to combine two csv files for the rows while removing duplicates between them? also please provide the code which you have tried so far and update that in text form rather in image form.
    – pygo
    yesterday












  • @P.Stegen, try this if you are looking for this pd.concat([df1, df2], axis = 1, join = 'inner', join_axes = [df1.index])[df1.columns]
    – pygo
    yesterday















up vote
-2
down vote

favorite












I want to combine 2 CSV files with a different number of columns and each having unique ones too. Both files are thousands of lines long.
the files look like this:



file1:



Sample Chromosome Name Total coverage Strand coverage Context coverage



0 x3ng 7 CpG 2 1 1

1 x3ng 7 CpG 2 1 1

2 x3ng 7 CpG 1 1 1

3 x3ng 7 CpG 1 1 1

4 x3ng 7 CpG 1 1 1



Methylated coverage Methylation level position complement

0 1 1 207605 non

1 1 1 207616 non

2 1 1 210880 non

3 1 1 218029 non

4 1 1 218064 non



file2:



Chromosome Gene ENSEMBL start end complement

0 7 AC093627.11 ENSG00000239715 174920 176013 non

1 7 FAM20C ENSG00000177706 192969 300711 non

2 11 RP11-359E10.1 ENSG00000270607 19732480 19734165 complement

3 11 MIR4694 ENSG00000264309 19781550 19781629 complement

4 11 NAV2-AS3 ENSG00000254542 20000245 20002883 complement

5 11 NAV2-AS2 ENSG00000254453 20065392 20070849 complement



and the result should look like this:



Sample Chromosome Name Total coverage Strand coverage Context coverage Methylated coverage Methylation level position complement Gene ENSEMBL start end

x3ng 7 CpG 2 1 1 1 1 207605 non FAM20C ENSG00000177706 192969 300711

x3ng 7 CpG 2 1 1 1 1 207616 non FAM20C ENSG00000177707 192969 300711

x3ng 7 CpG 1 1 1 1 1 210880 non FAM20C ENSG00000177708 192969 300711

x3ng 7 CpG 1 1 1 1 1 218029 non FAM20C ENSG00000177709 192969 300711

x3ng 7 CpG 1 1 1 1 1 218064 non FAM20C ENSG00000177710 192969 300711



as you can see, the conditions are:




  1. 'Chromosome' in file1 = 'Chromosome' in file2


  2. 'complement' in file1 = 'complement' in file2


  3. 'position' of file 1 between 'start' and 'end' of file2



Thank you in advance.










share|improve this question









New contributor




P.Stegen is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











put on hold as unclear what you're asking by timgeb, legoscia, greg-449, Robert Columbia, Rob yesterday


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.











  • 1




    Please be clearer about what the exact problem here is. Do we have to start with how to read in a csv file? If yes, the question is too broad.
    – timgeb
    yesterday










  • No i know that, i just don't know how to work with multiple conditions.
    – P.Stegen
    yesterday










  • Good, then please provide a Minimal, Complete, and Verifiable example and your own attempt.
    – timgeb
    yesterday












  • @P.Stegen, are you saying to combine two csv files for the rows while removing duplicates between them? also please provide the code which you have tried so far and update that in text form rather in image form.
    – pygo
    yesterday












  • @P.Stegen, try this if you are looking for this pd.concat([df1, df2], axis = 1, join = 'inner', join_axes = [df1.index])[df1.columns]
    – pygo
    yesterday













up vote
-2
down vote

favorite









up vote
-2
down vote

favorite











I want to combine 2 CSV files with a different number of columns and each having unique ones too. Both files are thousands of lines long.
the files look like this:



file1:



Sample Chromosome Name Total coverage Strand coverage Context coverage



0 x3ng 7 CpG 2 1 1

1 x3ng 7 CpG 2 1 1

2 x3ng 7 CpG 1 1 1

3 x3ng 7 CpG 1 1 1

4 x3ng 7 CpG 1 1 1



Methylated coverage Methylation level position complement

0 1 1 207605 non

1 1 1 207616 non

2 1 1 210880 non

3 1 1 218029 non

4 1 1 218064 non



file2:



Chromosome Gene ENSEMBL start end complement

0 7 AC093627.11 ENSG00000239715 174920 176013 non

1 7 FAM20C ENSG00000177706 192969 300711 non

2 11 RP11-359E10.1 ENSG00000270607 19732480 19734165 complement

3 11 MIR4694 ENSG00000264309 19781550 19781629 complement

4 11 NAV2-AS3 ENSG00000254542 20000245 20002883 complement

5 11 NAV2-AS2 ENSG00000254453 20065392 20070849 complement



and the result should look like this:



Sample Chromosome Name Total coverage Strand coverage Context coverage Methylated coverage Methylation level position complement Gene ENSEMBL start end

x3ng 7 CpG 2 1 1 1 1 207605 non FAM20C ENSG00000177706 192969 300711

x3ng 7 CpG 2 1 1 1 1 207616 non FAM20C ENSG00000177707 192969 300711

x3ng 7 CpG 1 1 1 1 1 210880 non FAM20C ENSG00000177708 192969 300711

x3ng 7 CpG 1 1 1 1 1 218029 non FAM20C ENSG00000177709 192969 300711

x3ng 7 CpG 1 1 1 1 1 218064 non FAM20C ENSG00000177710 192969 300711



as you can see, the conditions are:




  1. 'Chromosome' in file1 = 'Chromosome' in file2


  2. 'complement' in file1 = 'complement' in file2


  3. 'position' of file 1 between 'start' and 'end' of file2



Thank you in advance.










share|improve this question









New contributor




P.Stegen is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I want to combine 2 CSV files with a different number of columns and each having unique ones too. Both files are thousands of lines long.
the files look like this:



file1:



Sample Chromosome Name Total coverage Strand coverage Context coverage



0 x3ng 7 CpG 2 1 1

1 x3ng 7 CpG 2 1 1

2 x3ng 7 CpG 1 1 1

3 x3ng 7 CpG 1 1 1

4 x3ng 7 CpG 1 1 1



Methylated coverage Methylation level position complement

0 1 1 207605 non

1 1 1 207616 non

2 1 1 210880 non

3 1 1 218029 non

4 1 1 218064 non



file2:



Chromosome Gene ENSEMBL start end complement

0 7 AC093627.11 ENSG00000239715 174920 176013 non

1 7 FAM20C ENSG00000177706 192969 300711 non

2 11 RP11-359E10.1 ENSG00000270607 19732480 19734165 complement

3 11 MIR4694 ENSG00000264309 19781550 19781629 complement

4 11 NAV2-AS3 ENSG00000254542 20000245 20002883 complement

5 11 NAV2-AS2 ENSG00000254453 20065392 20070849 complement



and the result should look like this:



Sample Chromosome Name Total coverage Strand coverage Context coverage Methylated coverage Methylation level position complement Gene ENSEMBL start end

x3ng 7 CpG 2 1 1 1 1 207605 non FAM20C ENSG00000177706 192969 300711

x3ng 7 CpG 2 1 1 1 1 207616 non FAM20C ENSG00000177707 192969 300711

x3ng 7 CpG 1 1 1 1 1 210880 non FAM20C ENSG00000177708 192969 300711

x3ng 7 CpG 1 1 1 1 1 218029 non FAM20C ENSG00000177709 192969 300711

x3ng 7 CpG 1 1 1 1 1 218064 non FAM20C ENSG00000177710 192969 300711



as you can see, the conditions are:




  1. 'Chromosome' in file1 = 'Chromosome' in file2


  2. 'complement' in file1 = 'complement' in file2


  3. 'position' of file 1 between 'start' and 'end' of file2



Thank you in advance.







python csv dataframe






share|improve this question









New contributor




P.Stegen is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




P.Stegen is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 11 hours ago





















New contributor




P.Stegen is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked yesterday









P.Stegen

14




14




New contributor




P.Stegen is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





P.Stegen is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






P.Stegen is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




put on hold as unclear what you're asking by timgeb, legoscia, greg-449, Robert Columbia, Rob yesterday


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.






put on hold as unclear what you're asking by timgeb, legoscia, greg-449, Robert Columbia, Rob yesterday


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.










  • 1




    Please be clearer about what the exact problem here is. Do we have to start with how to read in a csv file? If yes, the question is too broad.
    – timgeb
    yesterday










  • No i know that, i just don't know how to work with multiple conditions.
    – P.Stegen
    yesterday










  • Good, then please provide a Minimal, Complete, and Verifiable example and your own attempt.
    – timgeb
    yesterday












  • @P.Stegen, are you saying to combine two csv files for the rows while removing duplicates between them? also please provide the code which you have tried so far and update that in text form rather in image form.
    – pygo
    yesterday












  • @P.Stegen, try this if you are looking for this pd.concat([df1, df2], axis = 1, join = 'inner', join_axes = [df1.index])[df1.columns]
    – pygo
    yesterday














  • 1




    Please be clearer about what the exact problem here is. Do we have to start with how to read in a csv file? If yes, the question is too broad.
    – timgeb
    yesterday










  • No i know that, i just don't know how to work with multiple conditions.
    – P.Stegen
    yesterday










  • Good, then please provide a Minimal, Complete, and Verifiable example and your own attempt.
    – timgeb
    yesterday












  • @P.Stegen, are you saying to combine two csv files for the rows while removing duplicates between them? also please provide the code which you have tried so far and update that in text form rather in image form.
    – pygo
    yesterday












  • @P.Stegen, try this if you are looking for this pd.concat([df1, df2], axis = 1, join = 'inner', join_axes = [df1.index])[df1.columns]
    – pygo
    yesterday








1




1




Please be clearer about what the exact problem here is. Do we have to start with how to read in a csv file? If yes, the question is too broad.
– timgeb
yesterday




Please be clearer about what the exact problem here is. Do we have to start with how to read in a csv file? If yes, the question is too broad.
– timgeb
yesterday












No i know that, i just don't know how to work with multiple conditions.
– P.Stegen
yesterday




No i know that, i just don't know how to work with multiple conditions.
– P.Stegen
yesterday












Good, then please provide a Minimal, Complete, and Verifiable example and your own attempt.
– timgeb
yesterday






Good, then please provide a Minimal, Complete, and Verifiable example and your own attempt.
– timgeb
yesterday














@P.Stegen, are you saying to combine two csv files for the rows while removing duplicates between them? also please provide the code which you have tried so far and update that in text form rather in image form.
– pygo
yesterday






@P.Stegen, are you saying to combine two csv files for the rows while removing duplicates between them? also please provide the code which you have tried so far and update that in text form rather in image form.
– pygo
yesterday














@P.Stegen, try this if you are looking for this pd.concat([df1, df2], axis = 1, join = 'inner', join_axes = [df1.index])[df1.columns]
– pygo
yesterday




@P.Stegen, try this if you are looking for this pd.concat([df1, df2], axis = 1, join = 'inner', join_axes = [df1.index])[df1.columns]
– pygo
yesterday

















active

oldest

votes






















active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes

Popular posts from this blog

Xamarin.iOS Cant Deploy on Iphone

Glorious Revolution

Dulmage-Mendelsohn matrix decomposition in Python