Python: combining rows of 2 csv files on multiple conditions [on hold]
up vote
-2
down vote
favorite
I want to combine 2 CSV files with a different number of columns and each having unique ones too. Both files are thousands of lines long.
the files look like this:
file1:
Sample Chromosome Name Total coverage Strand coverage Context coverage
0 x3ng 7 CpG 2 1 1
1 x3ng 7 CpG 2 1 1
2 x3ng 7 CpG 1 1 1
3 x3ng 7 CpG 1 1 1
4 x3ng 7 CpG 1 1 1
Methylated coverage Methylation level position complement
0 1 1 207605 non
1 1 1 207616 non
2 1 1 210880 non
3 1 1 218029 non
4 1 1 218064 non
file2:
Chromosome Gene ENSEMBL start end complement
0 7 AC093627.11 ENSG00000239715 174920 176013 non
1 7 FAM20C ENSG00000177706 192969 300711 non
2 11 RP11-359E10.1 ENSG00000270607 19732480 19734165 complement
3 11 MIR4694 ENSG00000264309 19781550 19781629 complement
4 11 NAV2-AS3 ENSG00000254542 20000245 20002883 complement
5 11 NAV2-AS2 ENSG00000254453 20065392 20070849 complement
and the result should look like this:
Sample Chromosome Name Total coverage Strand coverage Context coverage Methylated coverage Methylation level position complement Gene ENSEMBL start end
x3ng 7 CpG 2 1 1 1 1 207605 non FAM20C ENSG00000177706 192969 300711
x3ng 7 CpG 2 1 1 1 1 207616 non FAM20C ENSG00000177707 192969 300711
x3ng 7 CpG 1 1 1 1 1 210880 non FAM20C ENSG00000177708 192969 300711
x3ng 7 CpG 1 1 1 1 1 218029 non FAM20C ENSG00000177709 192969 300711
x3ng 7 CpG 1 1 1 1 1 218064 non FAM20C ENSG00000177710 192969 300711
as you can see, the conditions are:
'Chromosome' in file1 = 'Chromosome' in file2
'complement' in file1 = 'complement' in file2
'position' of file 1 between 'start' and 'end' of file2
Thank you in advance.
python csv dataframe
New contributor
put on hold as unclear what you're asking by timgeb, legoscia, greg-449, Robert Columbia, Rob yesterday
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
|
show 1 more comment
up vote
-2
down vote
favorite
I want to combine 2 CSV files with a different number of columns and each having unique ones too. Both files are thousands of lines long.
the files look like this:
file1:
Sample Chromosome Name Total coverage Strand coverage Context coverage
0 x3ng 7 CpG 2 1 1
1 x3ng 7 CpG 2 1 1
2 x3ng 7 CpG 1 1 1
3 x3ng 7 CpG 1 1 1
4 x3ng 7 CpG 1 1 1
Methylated coverage Methylation level position complement
0 1 1 207605 non
1 1 1 207616 non
2 1 1 210880 non
3 1 1 218029 non
4 1 1 218064 non
file2:
Chromosome Gene ENSEMBL start end complement
0 7 AC093627.11 ENSG00000239715 174920 176013 non
1 7 FAM20C ENSG00000177706 192969 300711 non
2 11 RP11-359E10.1 ENSG00000270607 19732480 19734165 complement
3 11 MIR4694 ENSG00000264309 19781550 19781629 complement
4 11 NAV2-AS3 ENSG00000254542 20000245 20002883 complement
5 11 NAV2-AS2 ENSG00000254453 20065392 20070849 complement
and the result should look like this:
Sample Chromosome Name Total coverage Strand coverage Context coverage Methylated coverage Methylation level position complement Gene ENSEMBL start end
x3ng 7 CpG 2 1 1 1 1 207605 non FAM20C ENSG00000177706 192969 300711
x3ng 7 CpG 2 1 1 1 1 207616 non FAM20C ENSG00000177707 192969 300711
x3ng 7 CpG 1 1 1 1 1 210880 non FAM20C ENSG00000177708 192969 300711
x3ng 7 CpG 1 1 1 1 1 218029 non FAM20C ENSG00000177709 192969 300711
x3ng 7 CpG 1 1 1 1 1 218064 non FAM20C ENSG00000177710 192969 300711
as you can see, the conditions are:
'Chromosome' in file1 = 'Chromosome' in file2
'complement' in file1 = 'complement' in file2
'position' of file 1 between 'start' and 'end' of file2
Thank you in advance.
python csv dataframe
New contributor
put on hold as unclear what you're asking by timgeb, legoscia, greg-449, Robert Columbia, Rob yesterday
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
1
Please be clearer about what the exact problem here is. Do we have to start with how to read in a csv file? If yes, the question is too broad.
– timgeb
yesterday
No i know that, i just don't know how to work with multiple conditions.
– P.Stegen
yesterday
Good, then please provide a Minimal, Complete, and Verifiable example and your own attempt.
– timgeb
yesterday
@P.Stegen, are you saying to combine two csv files for the rows while removing duplicates between them? also please provide the code which you have tried so far and update that in text form rather in image form.
– pygo
yesterday
@P.Stegen, try this if you are looking for thispd.concat([df1, df2], axis = 1, join = 'inner', join_axes = [df1.index])[df1.columns]
– pygo
yesterday
|
show 1 more comment
up vote
-2
down vote
favorite
up vote
-2
down vote
favorite
I want to combine 2 CSV files with a different number of columns and each having unique ones too. Both files are thousands of lines long.
the files look like this:
file1:
Sample Chromosome Name Total coverage Strand coverage Context coverage
0 x3ng 7 CpG 2 1 1
1 x3ng 7 CpG 2 1 1
2 x3ng 7 CpG 1 1 1
3 x3ng 7 CpG 1 1 1
4 x3ng 7 CpG 1 1 1
Methylated coverage Methylation level position complement
0 1 1 207605 non
1 1 1 207616 non
2 1 1 210880 non
3 1 1 218029 non
4 1 1 218064 non
file2:
Chromosome Gene ENSEMBL start end complement
0 7 AC093627.11 ENSG00000239715 174920 176013 non
1 7 FAM20C ENSG00000177706 192969 300711 non
2 11 RP11-359E10.1 ENSG00000270607 19732480 19734165 complement
3 11 MIR4694 ENSG00000264309 19781550 19781629 complement
4 11 NAV2-AS3 ENSG00000254542 20000245 20002883 complement
5 11 NAV2-AS2 ENSG00000254453 20065392 20070849 complement
and the result should look like this:
Sample Chromosome Name Total coverage Strand coverage Context coverage Methylated coverage Methylation level position complement Gene ENSEMBL start end
x3ng 7 CpG 2 1 1 1 1 207605 non FAM20C ENSG00000177706 192969 300711
x3ng 7 CpG 2 1 1 1 1 207616 non FAM20C ENSG00000177707 192969 300711
x3ng 7 CpG 1 1 1 1 1 210880 non FAM20C ENSG00000177708 192969 300711
x3ng 7 CpG 1 1 1 1 1 218029 non FAM20C ENSG00000177709 192969 300711
x3ng 7 CpG 1 1 1 1 1 218064 non FAM20C ENSG00000177710 192969 300711
as you can see, the conditions are:
'Chromosome' in file1 = 'Chromosome' in file2
'complement' in file1 = 'complement' in file2
'position' of file 1 between 'start' and 'end' of file2
Thank you in advance.
python csv dataframe
New contributor
I want to combine 2 CSV files with a different number of columns and each having unique ones too. Both files are thousands of lines long.
the files look like this:
file1:
Sample Chromosome Name Total coverage Strand coverage Context coverage
0 x3ng 7 CpG 2 1 1
1 x3ng 7 CpG 2 1 1
2 x3ng 7 CpG 1 1 1
3 x3ng 7 CpG 1 1 1
4 x3ng 7 CpG 1 1 1
Methylated coverage Methylation level position complement
0 1 1 207605 non
1 1 1 207616 non
2 1 1 210880 non
3 1 1 218029 non
4 1 1 218064 non
file2:
Chromosome Gene ENSEMBL start end complement
0 7 AC093627.11 ENSG00000239715 174920 176013 non
1 7 FAM20C ENSG00000177706 192969 300711 non
2 11 RP11-359E10.1 ENSG00000270607 19732480 19734165 complement
3 11 MIR4694 ENSG00000264309 19781550 19781629 complement
4 11 NAV2-AS3 ENSG00000254542 20000245 20002883 complement
5 11 NAV2-AS2 ENSG00000254453 20065392 20070849 complement
and the result should look like this:
Sample Chromosome Name Total coverage Strand coverage Context coverage Methylated coverage Methylation level position complement Gene ENSEMBL start end
x3ng 7 CpG 2 1 1 1 1 207605 non FAM20C ENSG00000177706 192969 300711
x3ng 7 CpG 2 1 1 1 1 207616 non FAM20C ENSG00000177707 192969 300711
x3ng 7 CpG 1 1 1 1 1 210880 non FAM20C ENSG00000177708 192969 300711
x3ng 7 CpG 1 1 1 1 1 218029 non FAM20C ENSG00000177709 192969 300711
x3ng 7 CpG 1 1 1 1 1 218064 non FAM20C ENSG00000177710 192969 300711
as you can see, the conditions are:
'Chromosome' in file1 = 'Chromosome' in file2
'complement' in file1 = 'complement' in file2
'position' of file 1 between 'start' and 'end' of file2
Thank you in advance.
python csv dataframe
python csv dataframe
New contributor
New contributor
edited 11 hours ago
New contributor
asked yesterday
P.Stegen
14
14
New contributor
New contributor
put on hold as unclear what you're asking by timgeb, legoscia, greg-449, Robert Columbia, Rob yesterday
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
put on hold as unclear what you're asking by timgeb, legoscia, greg-449, Robert Columbia, Rob yesterday
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
1
Please be clearer about what the exact problem here is. Do we have to start with how to read in a csv file? If yes, the question is too broad.
– timgeb
yesterday
No i know that, i just don't know how to work with multiple conditions.
– P.Stegen
yesterday
Good, then please provide a Minimal, Complete, and Verifiable example and your own attempt.
– timgeb
yesterday
@P.Stegen, are you saying to combine two csv files for the rows while removing duplicates between them? also please provide the code which you have tried so far and update that in text form rather in image form.
– pygo
yesterday
@P.Stegen, try this if you are looking for thispd.concat([df1, df2], axis = 1, join = 'inner', join_axes = [df1.index])[df1.columns]
– pygo
yesterday
|
show 1 more comment
1
Please be clearer about what the exact problem here is. Do we have to start with how to read in a csv file? If yes, the question is too broad.
– timgeb
yesterday
No i know that, i just don't know how to work with multiple conditions.
– P.Stegen
yesterday
Good, then please provide a Minimal, Complete, and Verifiable example and your own attempt.
– timgeb
yesterday
@P.Stegen, are you saying to combine two csv files for the rows while removing duplicates between them? also please provide the code which you have tried so far and update that in text form rather in image form.
– pygo
yesterday
@P.Stegen, try this if you are looking for thispd.concat([df1, df2], axis = 1, join = 'inner', join_axes = [df1.index])[df1.columns]
– pygo
yesterday
1
1
Please be clearer about what the exact problem here is. Do we have to start with how to read in a csv file? If yes, the question is too broad.
– timgeb
yesterday
Please be clearer about what the exact problem here is. Do we have to start with how to read in a csv file? If yes, the question is too broad.
– timgeb
yesterday
No i know that, i just don't know how to work with multiple conditions.
– P.Stegen
yesterday
No i know that, i just don't know how to work with multiple conditions.
– P.Stegen
yesterday
Good, then please provide a Minimal, Complete, and Verifiable example and your own attempt.
– timgeb
yesterday
Good, then please provide a Minimal, Complete, and Verifiable example and your own attempt.
– timgeb
yesterday
@P.Stegen, are you saying to combine two csv files for the rows while removing duplicates between them? also please provide the code which you have tried so far and update that in text form rather in image form.
– pygo
yesterday
@P.Stegen, are you saying to combine two csv files for the rows while removing duplicates between them? also please provide the code which you have tried so far and update that in text form rather in image form.
– pygo
yesterday
@P.Stegen, try this if you are looking for this
pd.concat([df1, df2], axis = 1, join = 'inner', join_axes = [df1.index])[df1.columns]
– pygo
yesterday
@P.Stegen, try this if you are looking for this
pd.concat([df1, df2], axis = 1, join = 'inner', join_axes = [df1.index])[df1.columns]
– pygo
yesterday
|
show 1 more comment
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
1
Please be clearer about what the exact problem here is. Do we have to start with how to read in a csv file? If yes, the question is too broad.
– timgeb
yesterday
No i know that, i just don't know how to work with multiple conditions.
– P.Stegen
yesterday
Good, then please provide a Minimal, Complete, and Verifiable example and your own attempt.
– timgeb
yesterday
@P.Stegen, are you saying to combine two csv files for the rows while removing duplicates between them? also please provide the code which you have tried so far and update that in text form rather in image form.
– pygo
yesterday
@P.Stegen, try this if you are looking for this
pd.concat([df1, df2], axis = 1, join = 'inner', join_axes = [df1.index])[df1.columns]
– pygo
yesterday