Looking for advice on processing variable length text files

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I need to write some code to process a tab delimited text files that has defined segments with variable lengths in R.

The table included in each segment should start 1 line below "Group:" and stop either 2 lines above "~End" if the group is a control or 6 lines above "~End" if if the group is a standard. The lengths of the tables themselves will be variable and can be empty like the segment "SpikedControl".

and example file looks like this:

Group:  Controls    1

Sample  Wells   Values  MeanValue   CV  Od-bkgd-blank

Anti-Hu Det A11 2.849   2.855   0.282   2.853

    A23 2.860           

Coat Control    A12 0.161   0.160   0.530   0.159

    A24 0.160           

Diluent Standard 1  A9  0.114   0.113   1.379   0.104

    A21 0.112           

Diluent Standard 2  A8  0.012   0.013   2.817   0.012

    A20 0.013           



~End

Group:  SpikedControl   1

Sample  Wells   Concentration   Values  MeanValue   CV  ODbkgdblank Conc    %Expected



~End

Group:  Standards   1

Sample  ExpConc Wells   OD  CV OD   ODblank MeanODBlank Result  %Recovery

St001   2000.000    B1  2.939   1.4 2.932   2.904   Range?  Range?

        B13 2.882       2.875       1153.779    57.689

St002   666.667 B2  2.820   0.9 2.812   2.829   456.435 68.465

        B14 2.855       2.847       670.358 100.554

St003   222.222 B3  2.677   0.9 2.669   2.686   208.849 93.982

        B15 2.709       2.702       237.852 107.033

St004   74.074  B4  2.215   1.4 2.205   2.226   72.185  97.449

        B16 2.258       2.248       77.452  104.560

St005   24.691  B5  1.406   1.3 1.397   1.410   24.296  98.397

        B17 1.433       1.424       25.153  101.868

St006   8.230   B6  0.669   2.9 0.658   0.672   7.781   94.536

        B18 0.697       0.686       8.240   100.115

St007   2.743   B7  0.357   5.8 0.348   0.334   3.143   114.579

        B19 0.329       0.320       2.759   100.579

St008   0.914   B8  0.198   3.7 0.191   0.186   1.029   112.551

        B20 0.188       0.181       0.895   97.891

St009   0.305   B9  0.163   7.8 0.154   0.146   0.532   174.477

        B21 0.146       0.137       0.296   97.190

St010   0.102   B10 0.130   5.1 0.123   0.119   0.096   94.087

        B22 0.121       0.114       Range?  Range?

St011   0.034   B11 0.133   4.7 0.126   0.122   0.134   394.778

        B23 0.125       0.117       Range?  Range?

St012   0.011   B12 0.117   0.7 0.105   0.104   Range?  Range?

        B24 0.115       0.104       Range?  Range?



EC50 = 28.085



AUC = 5565.432



~End

I am not very experienced with processing text files like this, and am looking on some advice on how to approach identifying these segments and reading the tables within.

Thanks!

Edit - Link to example file:

https://www.dropbox.com/s/4m0lmbbequmpd9b/ExampleFile.txt?dl=0

PS: these files are spit out from a spectrophotometer so I don't have any control over the format as the software is pretty antiquated.

Edit 2 - Making some progress:

Read in file and get start and end lines for each segment

inputtext <- readLines("ExampleFile.txt")



starts <- grep("Group:", inputtext)

ends <- grep("~End", inputtext)

which I can follow up with

test2 <- read.table("ExampleFile.txt", header = T, sep = "t", skip = 17, nrows = 24, blank.lines.skip = F)

now I am just trying to figure out how to accurately identify the number of rows to read.

So if the start row is 17, and the end row is 48. Then nrows needs to 24 which is 48 (end row indicated) - 17 (the first rows that are skipped) - 7 (to account for the header line and lines of fluff on the end of the table, which could also be 4 if it is a control table)

Now I just need to figure out how to loop this and properly identify whether the group is a control or standard to subtract the right amount of fluff.

edited Nov 16 '18 at 17:46

asked Nov 16 '18 at 13:42

RevDev

1319

add a comment |

I need to write some code to process a tab delimited text files that has defined segments with variable lengths in R.

and example file looks like this:

Group:  Controls    1

Sample  Wells   Values  MeanValue   CV  Od-bkgd-blank

Anti-Hu Det A11 2.849   2.855   0.282   2.853

    A23 2.860           

Coat Control    A12 0.161   0.160   0.530   0.159

    A24 0.160           

Diluent Standard 1  A9  0.114   0.113   1.379   0.104

    A21 0.112           

Diluent Standard 2  A8  0.012   0.013   2.817   0.012

    A20 0.013           



~End

Group:  SpikedControl   1

Sample  Wells   Concentration   Values  MeanValue   CV  ODbkgdblank Conc    %Expected



~End

Group:  Standards   1

Sample  ExpConc Wells   OD  CV OD   ODblank MeanODBlank Result  %Recovery

St001   2000.000    B1  2.939   1.4 2.932   2.904   Range?  Range?

        B13 2.882       2.875       1153.779    57.689

St002   666.667 B2  2.820   0.9 2.812   2.829   456.435 68.465

        B14 2.855       2.847       670.358 100.554

St003   222.222 B3  2.677   0.9 2.669   2.686   208.849 93.982

        B15 2.709       2.702       237.852 107.033

St004   74.074  B4  2.215   1.4 2.205   2.226   72.185  97.449

        B16 2.258       2.248       77.452  104.560

St005   24.691  B5  1.406   1.3 1.397   1.410   24.296  98.397

        B17 1.433       1.424       25.153  101.868

St006   8.230   B6  0.669   2.9 0.658   0.672   7.781   94.536

        B18 0.697       0.686       8.240   100.115

St007   2.743   B7  0.357   5.8 0.348   0.334   3.143   114.579

        B19 0.329       0.320       2.759   100.579

St008   0.914   B8  0.198   3.7 0.191   0.186   1.029   112.551

        B20 0.188       0.181       0.895   97.891

St009   0.305   B9  0.163   7.8 0.154   0.146   0.532   174.477

        B21 0.146       0.137       0.296   97.190

St010   0.102   B10 0.130   5.1 0.123   0.119   0.096   94.087

        B22 0.121       0.114       Range?  Range?

St011   0.034   B11 0.133   4.7 0.126   0.122   0.134   394.778

        B23 0.125       0.117       Range?  Range?

St012   0.011   B12 0.117   0.7 0.105   0.104   Range?  Range?

        B24 0.115       0.104       Range?  Range?



EC50 = 28.085



AUC = 5565.432



~End

I am not very experienced with processing text files like this, and am looking on some advice on how to approach identifying these segments and reading the tables within.

Thanks!

Edit - Link to example file:

https://www.dropbox.com/s/4m0lmbbequmpd9b/ExampleFile.txt?dl=0

PS: these files are spit out from a spectrophotometer so I don't have any control over the format as the software is pretty antiquated.

Edit 2 - Making some progress:

Read in file and get start and end lines for each segment

inputtext <- readLines("ExampleFile.txt")



starts <- grep("Group:", inputtext)

ends <- grep("~End", inputtext)

which I can follow up with

test2 <- read.table("ExampleFile.txt", header = T, sep = "t", skip = 17, nrows = 24, blank.lines.skip = F)

now I am just trying to figure out how to accurately identify the number of rows to read.

Now I just need to figure out how to loop this and properly identify whether the group is a control or standard to subtract the right amount of fluff.

edited Nov 16 '18 at 17:46

asked Nov 16 '18 at 13:42

RevDev

1319

add a comment |

I need to write some code to process a tab delimited text files that has defined segments with variable lengths in R.

and example file looks like this:

Group:  Controls    1

Sample  Wells   Values  MeanValue   CV  Od-bkgd-blank

Anti-Hu Det A11 2.849   2.855   0.282   2.853

    A23 2.860           

Coat Control    A12 0.161   0.160   0.530   0.159

    A24 0.160           

Diluent Standard 1  A9  0.114   0.113   1.379   0.104

    A21 0.112           

Diluent Standard 2  A8  0.012   0.013   2.817   0.012

    A20 0.013           



~End

Group:  SpikedControl   1

Sample  Wells   Concentration   Values  MeanValue   CV  ODbkgdblank Conc    %Expected



~End

Group:  Standards   1

Sample  ExpConc Wells   OD  CV OD   ODblank MeanODBlank Result  %Recovery

St001   2000.000    B1  2.939   1.4 2.932   2.904   Range?  Range?

        B13 2.882       2.875       1153.779    57.689

St002   666.667 B2  2.820   0.9 2.812   2.829   456.435 68.465

        B14 2.855       2.847       670.358 100.554

St003   222.222 B3  2.677   0.9 2.669   2.686   208.849 93.982

        B15 2.709       2.702       237.852 107.033

St004   74.074  B4  2.215   1.4 2.205   2.226   72.185  97.449

        B16 2.258       2.248       77.452  104.560

St005   24.691  B5  1.406   1.3 1.397   1.410   24.296  98.397

        B17 1.433       1.424       25.153  101.868

St006   8.230   B6  0.669   2.9 0.658   0.672   7.781   94.536

        B18 0.697       0.686       8.240   100.115

St007   2.743   B7  0.357   5.8 0.348   0.334   3.143   114.579

        B19 0.329       0.320       2.759   100.579

St008   0.914   B8  0.198   3.7 0.191   0.186   1.029   112.551

        B20 0.188       0.181       0.895   97.891

St009   0.305   B9  0.163   7.8 0.154   0.146   0.532   174.477

        B21 0.146       0.137       0.296   97.190

St010   0.102   B10 0.130   5.1 0.123   0.119   0.096   94.087

        B22 0.121       0.114       Range?  Range?

St011   0.034   B11 0.133   4.7 0.126   0.122   0.134   394.778

        B23 0.125       0.117       Range?  Range?

St012   0.011   B12 0.117   0.7 0.105   0.104   Range?  Range?

        B24 0.115       0.104       Range?  Range?



EC50 = 28.085



AUC = 5565.432



~End

I am not very experienced with processing text files like this, and am looking on some advice on how to approach identifying these segments and reading the tables within.

Thanks!

Edit - Link to example file:

https://www.dropbox.com/s/4m0lmbbequmpd9b/ExampleFile.txt?dl=0

PS: these files are spit out from a spectrophotometer so I don't have any control over the format as the software is pretty antiquated.

Edit 2 - Making some progress:

Read in file and get start and end lines for each segment

inputtext <- readLines("ExampleFile.txt")



starts <- grep("Group:", inputtext)

ends <- grep("~End", inputtext)

which I can follow up with

test2 <- read.table("ExampleFile.txt", header = T, sep = "t", skip = 17, nrows = 24, blank.lines.skip = F)

now I am just trying to figure out how to accurately identify the number of rows to read.

Now I just need to figure out how to loop this and properly identify whether the group is a control or standard to subtract the right amount of fluff.

edited Nov 16 '18 at 17:46

asked Nov 16 '18 at 13:42

RevDev

1319

I need to write some code to process a tab delimited text files that has defined segments with variable lengths in R.

and example file looks like this:

Group:  Controls    1

Sample  Wells   Values  MeanValue   CV  Od-bkgd-blank

Anti-Hu Det A11 2.849   2.855   0.282   2.853

    A23 2.860           

Coat Control    A12 0.161   0.160   0.530   0.159

    A24 0.160           

Diluent Standard 1  A9  0.114   0.113   1.379   0.104

    A21 0.112           

Diluent Standard 2  A8  0.012   0.013   2.817   0.012

    A20 0.013           



~End

Group:  SpikedControl   1

Sample  Wells   Concentration   Values  MeanValue   CV  ODbkgdblank Conc    %Expected



~End

Group:  Standards   1

Sample  ExpConc Wells   OD  CV OD   ODblank MeanODBlank Result  %Recovery

St001   2000.000    B1  2.939   1.4 2.932   2.904   Range?  Range?

        B13 2.882       2.875       1153.779    57.689

St002   666.667 B2  2.820   0.9 2.812   2.829   456.435 68.465

        B14 2.855       2.847       670.358 100.554

St003   222.222 B3  2.677   0.9 2.669   2.686   208.849 93.982

        B15 2.709       2.702       237.852 107.033

St004   74.074  B4  2.215   1.4 2.205   2.226   72.185  97.449

        B16 2.258       2.248       77.452  104.560

St005   24.691  B5  1.406   1.3 1.397   1.410   24.296  98.397

        B17 1.433       1.424       25.153  101.868

St006   8.230   B6  0.669   2.9 0.658   0.672   7.781   94.536

        B18 0.697       0.686       8.240   100.115

St007   2.743   B7  0.357   5.8 0.348   0.334   3.143   114.579

        B19 0.329       0.320       2.759   100.579

St008   0.914   B8  0.198   3.7 0.191   0.186   1.029   112.551

        B20 0.188       0.181       0.895   97.891

St009   0.305   B9  0.163   7.8 0.154   0.146   0.532   174.477

        B21 0.146       0.137       0.296   97.190

St010   0.102   B10 0.130   5.1 0.123   0.119   0.096   94.087

        B22 0.121       0.114       Range?  Range?

St011   0.034   B11 0.133   4.7 0.126   0.122   0.134   394.778

        B23 0.125       0.117       Range?  Range?

St012   0.011   B12 0.117   0.7 0.105   0.104   Range?  Range?

        B24 0.115       0.104       Range?  Range?



EC50 = 28.085



AUC = 5565.432



~End

I am not very experienced with processing text files like this, and am looking on some advice on how to approach identifying these segments and reading the tables within.

Thanks!

Edit - Link to example file:

https://www.dropbox.com/s/4m0lmbbequmpd9b/ExampleFile.txt?dl=0

PS: these files are spit out from a spectrophotometer so I don't have any control over the format as the software is pretty antiquated.

Edit 2 - Making some progress:

Read in file and get start and end lines for each segment

inputtext <- readLines("ExampleFile.txt")



starts <- grep("Group:", inputtext)

ends <- grep("~End", inputtext)

which I can follow up with

test2 <- read.table("ExampleFile.txt", header = T, sep = "t", skip = 17, nrows = 24, blank.lines.skip = F)

now I am just trying to figure out how to accurately identify the number of rows to read.

Now I just need to figure out how to loop this and properly identify whether the group is a control or standard to subtract the right amount of fluff.

edited Nov 16 '18 at 17:46

asked Nov 16 '18 at 13:42

RevDev

1319

edited Nov 16 '18 at 17:46

asked Nov 16 '18 at 13:42

RevDev

1319

edited Nov 16 '18 at 17:46

asked Nov 16 '18 at 13:42

RevDev

1319

asked Nov 16 '18 at 13:42

RevDev

1319

asked Nov 16 '18 at 13:42

RevDev

1319

add a comment |

2 Answers
2

active

oldest

votes

I ended up doing the following:

library(tidyverse)



inputtext <- readLines("Test.txt")



starts <- grep("Group:", inputtext)

ends <- grep("~End", inputtext)



realend <- (ends - starts) - 2



dff <- list()



for (i in 1:4) {



  dff[[i]] <- read.table("Test.txt",

                   header = T,

                   sep = "t",

                   skip = starts[i],

                   nrows = realend[i],

                   blank.lines.skip = F,

                   row.names = NULL)



}



dff <- lapply(dff, function(x) {x[!is.na(x$Values),]})



dff <- dff[sapply(dff, function(x) dim(x)[1]) > 0]



names(dff) <- letters[1:length(dff)]



list2env(dff,.GlobalEnv)

answered Nov 16 '18 at 19:13

RevDev

1319

add a comment |

You can use a nested data frame for that purpose. I detailed the method in 3 steps below.

library(tidyverse)

inputtext <- readLines("~/downloads/ExampleFile.txt")

1. Read data in a single data frame, create a column with the group name

dtf1 <- data_frame(input = inputtext) %>% 

    separate(input, c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"), sep = "t") %>% 

    mutate(group = ifelse(grepl("Group:", x1), x2, NA)) %>% 

    fill(group) %>% 

    filter(!is.na(x4)) 

head(dtf1)

2. Nest the data frame

dtf2 <- dtf1 %>% 

    group_by(group) %>% 

    nest() 

dtf2$data[[1]]

Give column names from the data to the first nested data frame

colnames1 <- dtf2$data[[1]] %>% slice(1) %>% unlist()

colnames1[is.na(colnames1)] <- names(colnames1[is.na(colnames1)])

colnames(dtf2$data[[1]]) <- colnames1

3. Give column names from the data to each sub data frame

    dtf3 <- dtf2 %>% 

    mutate(names = map(data, slice, 1),

           names = map(names, unlist),

           names = map(names,

                       function(x){ # Replace NA column names by the default x_ names

                           x[is.na(x)] <- names(x[is.na(x)])

                           return(x)

                       }),

           data = map2(data, names, setNames), 

           data = map(data, slice, -1))

You now have a list of data frames. You can use the group name to call the corresponding data frame:

 > dtf3$data[dtf3$group=="Controls"][[1]]

# A tibble: 8 x 9

  Sample             Wells Values MeanValue CV    `Od-bkgd-blank` x7    x8    x9   

  <chr>              <chr> <chr>  <chr>     <chr> <chr>           <chr> <chr> <chr>

1 Anti-Hu Det        A11   2.849  2.855     0.282 2.853           NA    NA    NA   

2 ""                 A23   2.860  ""        ""    ""              NA    NA    NA   

3 Coat Control       A12   0.161  0.160     0.530 0.159           NA    NA    NA   

4 ""                 A24   0.160  ""        ""    ""              NA    NA    NA   

5 Diluent Standard 1 A9    0.114  0.113     1.379 0.104           NA    NA    NA   

6 ""                 A21   0.112  ""        ""    ""              NA    NA    NA   

7 Diluent Standard 2 A8    0.012  0.013     2.817 0.012           NA    NA    NA   

8 ""                 A20   0.013  ""        ""    ""              NA    NA    NA   

> dtf3$data[dtf3$group=="Standards"][[1]]

# A tibble: 24 x 9

   Sample ExpConc  Wells OD    `CV OD` ODblank MeanODBlank Result   `%Recovery`

   <chr>  <chr>    <chr> <chr> <chr>   <chr>   <chr>       <chr>    <chr>      

 1 St001  2000.000 B1    2.939 1.4     2.932   2.904       Range?   Range?     

 2 ""     ""       B13   2.882 ""      2.875   ""          1153.779 57.689     

 3 St002  666.667  B2    2.820 0.9     2.812   2.829       456.435  68.465     

 4 ""     ""       B14   2.855 ""      2.847   ""          670.358  100.554    

 5 St003  222.222  B3    2.677 0.9     2.669   2.686       208.849  93.982     

 6 ""     ""       B15   2.709 ""      2.702   ""          237.852  107.033    

 7 St004  74.074   B4    2.215 1.4     2.205   2.226       72.185   97.449     

 8 ""     ""       B16   2.258 ""      2.248   ""          77.452   104.560    

 9 St005  24.691   B5    1.406 1.3     1.397   1.410       24.296   98.397     

10 ""     ""       B17   1.433 ""      1.424   ""          25.153   101.868    

# ... with 14 more rows

>

Note: rename based on this answer.

Placed source data as a gist in case it gets lost:
spectrophotometer.txt

edited Nov 20 '18 at 8:41

answered Nov 16 '18 at 14:17

Paul Rougieux

4,35612458

Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.

– RevDev
Nov 16 '18 at 14:38

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53339042%2flooking-for-advice-on-processing-variable-length-text-files%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

I ended up doing the following:

library(tidyverse)



inputtext <- readLines("Test.txt")



starts <- grep("Group:", inputtext)

ends <- grep("~End", inputtext)



realend <- (ends - starts) - 2



dff <- list()



for (i in 1:4) {



  dff[[i]] <- read.table("Test.txt",

                   header = T,

                   sep = "t",

                   skip = starts[i],

                   nrows = realend[i],

                   blank.lines.skip = F,

                   row.names = NULL)



}



dff <- lapply(dff, function(x) {x[!is.na(x$Values),]})



dff <- dff[sapply(dff, function(x) dim(x)[1]) > 0]



names(dff) <- letters[1:length(dff)]



list2env(dff,.GlobalEnv)

answered Nov 16 '18 at 19:13

RevDev

1319

add a comment |

I ended up doing the following:

library(tidyverse)



inputtext <- readLines("Test.txt")



starts <- grep("Group:", inputtext)

ends <- grep("~End", inputtext)



realend <- (ends - starts) - 2



dff <- list()



for (i in 1:4) {



  dff[[i]] <- read.table("Test.txt",

                   header = T,

                   sep = "t",

                   skip = starts[i],

                   nrows = realend[i],

                   blank.lines.skip = F,

                   row.names = NULL)



}



dff <- lapply(dff, function(x) {x[!is.na(x$Values),]})



dff <- dff[sapply(dff, function(x) dim(x)[1]) > 0]



names(dff) <- letters[1:length(dff)]



list2env(dff,.GlobalEnv)

answered Nov 16 '18 at 19:13

RevDev

1319

add a comment |

I ended up doing the following:

library(tidyverse)



inputtext <- readLines("Test.txt")



starts <- grep("Group:", inputtext)

ends <- grep("~End", inputtext)



realend <- (ends - starts) - 2



dff <- list()



for (i in 1:4) {



  dff[[i]] <- read.table("Test.txt",

                   header = T,

                   sep = "t",

                   skip = starts[i],

                   nrows = realend[i],

                   blank.lines.skip = F,

                   row.names = NULL)



}



dff <- lapply(dff, function(x) {x[!is.na(x$Values),]})



dff <- dff[sapply(dff, function(x) dim(x)[1]) > 0]



names(dff) <- letters[1:length(dff)]



list2env(dff,.GlobalEnv)

answered Nov 16 '18 at 19:13

RevDev

1319

I ended up doing the following:

library(tidyverse)



inputtext <- readLines("Test.txt")



starts <- grep("Group:", inputtext)

ends <- grep("~End", inputtext)



realend <- (ends - starts) - 2



dff <- list()



for (i in 1:4) {



  dff[[i]] <- read.table("Test.txt",

                   header = T,

                   sep = "t",

                   skip = starts[i],

                   nrows = realend[i],

                   blank.lines.skip = F,

                   row.names = NULL)



}



dff <- lapply(dff, function(x) {x[!is.na(x$Values),]})



dff <- dff[sapply(dff, function(x) dim(x)[1]) > 0]



names(dff) <- letters[1:length(dff)]



list2env(dff,.GlobalEnv)

answered Nov 16 '18 at 19:13

RevDev

1319

answered Nov 16 '18 at 19:13

RevDev

1319

answered Nov 16 '18 at 19:13

RevDev

1319

answered Nov 16 '18 at 19:13

RevDev

1319

add a comment |

You can use a nested data frame for that purpose. I detailed the method in 3 steps below.

library(tidyverse)

inputtext <- readLines("~/downloads/ExampleFile.txt")

1. Read data in a single data frame, create a column with the group name

dtf1 <- data_frame(input = inputtext) %>% 

    separate(input, c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"), sep = "t") %>% 

    mutate(group = ifelse(grepl("Group:", x1), x2, NA)) %>% 

    fill(group) %>% 

    filter(!is.na(x4)) 

head(dtf1)

2. Nest the data frame

dtf2 <- dtf1 %>% 

    group_by(group) %>% 

    nest() 

dtf2$data[[1]]

Give column names from the data to the first nested data frame

colnames1 <- dtf2$data[[1]] %>% slice(1) %>% unlist()

colnames1[is.na(colnames1)] <- names(colnames1[is.na(colnames1)])

colnames(dtf2$data[[1]]) <- colnames1

3. Give column names from the data to each sub data frame

    dtf3 <- dtf2 %>% 

    mutate(names = map(data, slice, 1),

           names = map(names, unlist),

           names = map(names,

                       function(x){ # Replace NA column names by the default x_ names

                           x[is.na(x)] <- names(x[is.na(x)])

                           return(x)

                       }),

           data = map2(data, names, setNames), 

           data = map(data, slice, -1))

You now have a list of data frames. You can use the group name to call the corresponding data frame:

 > dtf3$data[dtf3$group=="Controls"][[1]]

# A tibble: 8 x 9

  Sample             Wells Values MeanValue CV    `Od-bkgd-blank` x7    x8    x9   

  <chr>              <chr> <chr>  <chr>     <chr> <chr>           <chr> <chr> <chr>

1 Anti-Hu Det        A11   2.849  2.855     0.282 2.853           NA    NA    NA   

2 ""                 A23   2.860  ""        ""    ""              NA    NA    NA   

3 Coat Control       A12   0.161  0.160     0.530 0.159           NA    NA    NA   

4 ""                 A24   0.160  ""        ""    ""              NA    NA    NA   

5 Diluent Standard 1 A9    0.114  0.113     1.379 0.104           NA    NA    NA   

6 ""                 A21   0.112  ""        ""    ""              NA    NA    NA   

7 Diluent Standard 2 A8    0.012  0.013     2.817 0.012           NA    NA    NA   

8 ""                 A20   0.013  ""        ""    ""              NA    NA    NA   

> dtf3$data[dtf3$group=="Standards"][[1]]

# A tibble: 24 x 9

   Sample ExpConc  Wells OD    `CV OD` ODblank MeanODBlank Result   `%Recovery`

   <chr>  <chr>    <chr> <chr> <chr>   <chr>   <chr>       <chr>    <chr>      

 1 St001  2000.000 B1    2.939 1.4     2.932   2.904       Range?   Range?     

 2 ""     ""       B13   2.882 ""      2.875   ""          1153.779 57.689     

 3 St002  666.667  B2    2.820 0.9     2.812   2.829       456.435  68.465     

 4 ""     ""       B14   2.855 ""      2.847   ""          670.358  100.554    

 5 St003  222.222  B3    2.677 0.9     2.669   2.686       208.849  93.982     

 6 ""     ""       B15   2.709 ""      2.702   ""          237.852  107.033    

 7 St004  74.074   B4    2.215 1.4     2.205   2.226       72.185   97.449     

 8 ""     ""       B16   2.258 ""      2.248   ""          77.452   104.560    

 9 St005  24.691   B5    1.406 1.3     1.397   1.410       24.296   98.397     

10 ""     ""       B17   1.433 ""      1.424   ""          25.153   101.868    

# ... with 14 more rows

>

Note: rename based on this answer.

Placed source data as a gist in case it gets lost:
spectrophotometer.txt

edited Nov 20 '18 at 8:41

answered Nov 16 '18 at 14:17

Paul Rougieux

4,35612458

Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.

– RevDev
Nov 16 '18 at 14:38

add a comment |

You can use a nested data frame for that purpose. I detailed the method in 3 steps below.

library(tidyverse)

inputtext <- readLines("~/downloads/ExampleFile.txt")

1. Read data in a single data frame, create a column with the group name

dtf1 <- data_frame(input = inputtext) %>% 

    separate(input, c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"), sep = "t") %>% 

    mutate(group = ifelse(grepl("Group:", x1), x2, NA)) %>% 

    fill(group) %>% 

    filter(!is.na(x4)) 

head(dtf1)

2. Nest the data frame

dtf2 <- dtf1 %>% 

    group_by(group) %>% 

    nest() 

dtf2$data[[1]]

Give column names from the data to the first nested data frame

colnames1 <- dtf2$data[[1]] %>% slice(1) %>% unlist()

colnames1[is.na(colnames1)] <- names(colnames1[is.na(colnames1)])

colnames(dtf2$data[[1]]) <- colnames1

3. Give column names from the data to each sub data frame

    dtf3 <- dtf2 %>% 

    mutate(names = map(data, slice, 1),

           names = map(names, unlist),

           names = map(names,

                       function(x){ # Replace NA column names by the default x_ names

                           x[is.na(x)] <- names(x[is.na(x)])

                           return(x)

                       }),

           data = map2(data, names, setNames), 

           data = map(data, slice, -1))

You now have a list of data frames. You can use the group name to call the corresponding data frame:

 > dtf3$data[dtf3$group=="Controls"][[1]]

# A tibble: 8 x 9

  Sample             Wells Values MeanValue CV    `Od-bkgd-blank` x7    x8    x9   

  <chr>              <chr> <chr>  <chr>     <chr> <chr>           <chr> <chr> <chr>

1 Anti-Hu Det        A11   2.849  2.855     0.282 2.853           NA    NA    NA   

2 ""                 A23   2.860  ""        ""    ""              NA    NA    NA   

3 Coat Control       A12   0.161  0.160     0.530 0.159           NA    NA    NA   

4 ""                 A24   0.160  ""        ""    ""              NA    NA    NA   

5 Diluent Standard 1 A9    0.114  0.113     1.379 0.104           NA    NA    NA   

6 ""                 A21   0.112  ""        ""    ""              NA    NA    NA   

7 Diluent Standard 2 A8    0.012  0.013     2.817 0.012           NA    NA    NA   

8 ""                 A20   0.013  ""        ""    ""              NA    NA    NA   

> dtf3$data[dtf3$group=="Standards"][[1]]

# A tibble: 24 x 9

   Sample ExpConc  Wells OD    `CV OD` ODblank MeanODBlank Result   `%Recovery`

   <chr>  <chr>    <chr> <chr> <chr>   <chr>   <chr>       <chr>    <chr>      

 1 St001  2000.000 B1    2.939 1.4     2.932   2.904       Range?   Range?     

 2 ""     ""       B13   2.882 ""      2.875   ""          1153.779 57.689     

 3 St002  666.667  B2    2.820 0.9     2.812   2.829       456.435  68.465     

 4 ""     ""       B14   2.855 ""      2.847   ""          670.358  100.554    

 5 St003  222.222  B3    2.677 0.9     2.669   2.686       208.849  93.982     

 6 ""     ""       B15   2.709 ""      2.702   ""          237.852  107.033    

 7 St004  74.074   B4    2.215 1.4     2.205   2.226       72.185   97.449     

 8 ""     ""       B16   2.258 ""      2.248   ""          77.452   104.560    

 9 St005  24.691   B5    1.406 1.3     1.397   1.410       24.296   98.397     

10 ""     ""       B17   1.433 ""      1.424   ""          25.153   101.868    

# ... with 14 more rows

>

Note: rename based on this answer.

Placed source data as a gist in case it gets lost:
spectrophotometer.txt

edited Nov 20 '18 at 8:41

answered Nov 16 '18 at 14:17

Paul Rougieux

4,35612458

Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.

– RevDev
Nov 16 '18 at 14:38

add a comment |

You can use a nested data frame for that purpose. I detailed the method in 3 steps below.

library(tidyverse)

inputtext <- readLines("~/downloads/ExampleFile.txt")

1. Read data in a single data frame, create a column with the group name

dtf1 <- data_frame(input = inputtext) %>% 

    separate(input, c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"), sep = "t") %>% 

    mutate(group = ifelse(grepl("Group:", x1), x2, NA)) %>% 

    fill(group) %>% 

    filter(!is.na(x4)) 

head(dtf1)

2. Nest the data frame

dtf2 <- dtf1 %>% 

    group_by(group) %>% 

    nest() 

dtf2$data[[1]]

Give column names from the data to the first nested data frame

colnames1 <- dtf2$data[[1]] %>% slice(1) %>% unlist()

colnames1[is.na(colnames1)] <- names(colnames1[is.na(colnames1)])

colnames(dtf2$data[[1]]) <- colnames1

3. Give column names from the data to each sub data frame

    dtf3 <- dtf2 %>% 

    mutate(names = map(data, slice, 1),

           names = map(names, unlist),

           names = map(names,

                       function(x){ # Replace NA column names by the default x_ names

                           x[is.na(x)] <- names(x[is.na(x)])

                           return(x)

                       }),

           data = map2(data, names, setNames), 

           data = map(data, slice, -1))

You now have a list of data frames. You can use the group name to call the corresponding data frame:

 > dtf3$data[dtf3$group=="Controls"][[1]]

# A tibble: 8 x 9

  Sample             Wells Values MeanValue CV    `Od-bkgd-blank` x7    x8    x9   

  <chr>              <chr> <chr>  <chr>     <chr> <chr>           <chr> <chr> <chr>

1 Anti-Hu Det        A11   2.849  2.855     0.282 2.853           NA    NA    NA   

2 ""                 A23   2.860  ""        ""    ""              NA    NA    NA   

3 Coat Control       A12   0.161  0.160     0.530 0.159           NA    NA    NA   

4 ""                 A24   0.160  ""        ""    ""              NA    NA    NA   

5 Diluent Standard 1 A9    0.114  0.113     1.379 0.104           NA    NA    NA   

6 ""                 A21   0.112  ""        ""    ""              NA    NA    NA   

7 Diluent Standard 2 A8    0.012  0.013     2.817 0.012           NA    NA    NA   

8 ""                 A20   0.013  ""        ""    ""              NA    NA    NA   

> dtf3$data[dtf3$group=="Standards"][[1]]

# A tibble: 24 x 9

   Sample ExpConc  Wells OD    `CV OD` ODblank MeanODBlank Result   `%Recovery`

   <chr>  <chr>    <chr> <chr> <chr>   <chr>   <chr>       <chr>    <chr>      

 1 St001  2000.000 B1    2.939 1.4     2.932   2.904       Range?   Range?     

 2 ""     ""       B13   2.882 ""      2.875   ""          1153.779 57.689     

 3 St002  666.667  B2    2.820 0.9     2.812   2.829       456.435  68.465     

 4 ""     ""       B14   2.855 ""      2.847   ""          670.358  100.554    

 5 St003  222.222  B3    2.677 0.9     2.669   2.686       208.849  93.982     

 6 ""     ""       B15   2.709 ""      2.702   ""          237.852  107.033    

 7 St004  74.074   B4    2.215 1.4     2.205   2.226       72.185   97.449     

 8 ""     ""       B16   2.258 ""      2.248   ""          77.452   104.560    

 9 St005  24.691   B5    1.406 1.3     1.397   1.410       24.296   98.397     

10 ""     ""       B17   1.433 ""      1.424   ""          25.153   101.868    

# ... with 14 more rows

>

Note: rename based on this answer.

Placed source data as a gist in case it gets lost:
spectrophotometer.txt

edited Nov 20 '18 at 8:41

answered Nov 16 '18 at 14:17

Paul Rougieux

4,35612458

You can use a nested data frame for that purpose. I detailed the method in 3 steps below.

library(tidyverse)

inputtext <- readLines("~/downloads/ExampleFile.txt")

1. Read data in a single data frame, create a column with the group name

dtf1 <- data_frame(input = inputtext) %>% 

    separate(input, c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"), sep = "t") %>% 

    mutate(group = ifelse(grepl("Group:", x1), x2, NA)) %>% 

    fill(group) %>% 

    filter(!is.na(x4)) 

head(dtf1)

2. Nest the data frame

dtf2 <- dtf1 %>% 

    group_by(group) %>% 

    nest() 

dtf2$data[[1]]

Give column names from the data to the first nested data frame

colnames1 <- dtf2$data[[1]] %>% slice(1) %>% unlist()

colnames1[is.na(colnames1)] <- names(colnames1[is.na(colnames1)])

colnames(dtf2$data[[1]]) <- colnames1

3. Give column names from the data to each sub data frame

    dtf3 <- dtf2 %>% 

    mutate(names = map(data, slice, 1),

           names = map(names, unlist),

           names = map(names,

                       function(x){ # Replace NA column names by the default x_ names

                           x[is.na(x)] <- names(x[is.na(x)])

                           return(x)

                       }),

           data = map2(data, names, setNames), 

           data = map(data, slice, -1))

You now have a list of data frames. You can use the group name to call the corresponding data frame:

 > dtf3$data[dtf3$group=="Controls"][[1]]

# A tibble: 8 x 9

  Sample             Wells Values MeanValue CV    `Od-bkgd-blank` x7    x8    x9   

  <chr>              <chr> <chr>  <chr>     <chr> <chr>           <chr> <chr> <chr>

1 Anti-Hu Det        A11   2.849  2.855     0.282 2.853           NA    NA    NA   

2 ""                 A23   2.860  ""        ""    ""              NA    NA    NA   

3 Coat Control       A12   0.161  0.160     0.530 0.159           NA    NA    NA   

4 ""                 A24   0.160  ""        ""    ""              NA    NA    NA   

5 Diluent Standard 1 A9    0.114  0.113     1.379 0.104           NA    NA    NA   

6 ""                 A21   0.112  ""        ""    ""              NA    NA    NA   

7 Diluent Standard 2 A8    0.012  0.013     2.817 0.012           NA    NA    NA   

8 ""                 A20   0.013  ""        ""    ""              NA    NA    NA   

> dtf3$data[dtf3$group=="Standards"][[1]]

# A tibble: 24 x 9

   Sample ExpConc  Wells OD    `CV OD` ODblank MeanODBlank Result   `%Recovery`

   <chr>  <chr>    <chr> <chr> <chr>   <chr>   <chr>       <chr>    <chr>      

 1 St001  2000.000 B1    2.939 1.4     2.932   2.904       Range?   Range?     

 2 ""     ""       B13   2.882 ""      2.875   ""          1153.779 57.689     

 3 St002  666.667  B2    2.820 0.9     2.812   2.829       456.435  68.465     

 4 ""     ""       B14   2.855 ""      2.847   ""          670.358  100.554    

 5 St003  222.222  B3    2.677 0.9     2.669   2.686       208.849  93.982     

 6 ""     ""       B15   2.709 ""      2.702   ""          237.852  107.033    

 7 St004  74.074   B4    2.215 1.4     2.205   2.226       72.185   97.449     

 8 ""     ""       B16   2.258 ""      2.248   ""          77.452   104.560    

 9 St005  24.691   B5    1.406 1.3     1.397   1.410       24.296   98.397     

10 ""     ""       B17   1.433 ""      1.424   ""          25.153   101.868    

# ... with 14 more rows

>

Note: rename based on this answer.

Placed source data as a gist in case it gets lost:
spectrophotometer.txt

edited Nov 20 '18 at 8:41

answered Nov 16 '18 at 14:17

Paul Rougieux

4,35612458

edited Nov 20 '18 at 8:41

answered Nov 16 '18 at 14:17

Paul Rougieux

4,35612458

answered Nov 16 '18 at 14:17

Paul Rougieux

4,35612458

answered Nov 16 '18 at 14:17

Paul Rougieux

4,35612458

Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.

– RevDev
Nov 16 '18 at 14:38

add a comment |

Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.

– RevDev
Nov 16 '18 at 14:38

Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.

– RevDev
Nov 16 '18 at 14:38

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

zNOG6kX ekTC72YskwH9rm,ajJun1 wBN qlMUdkfvVIbMssyg4k,4L OW4v 2f31nsiL1IUiMOR03xNrEHJZ1 OFdq,2yII V

搜尋此網誌

Vfrdtyky

Looking for advice on processing variable length text files

2 Answers
2

1. Read data in a single data frame, create a column with the group name

2. Nest the data frame

3. Give column names from the data to each sub data frame

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

1. Read data in a single data frame, create a column with the group name

2. Nest the data frame

3. Give column names from the data to each sub data frame

1. Read data in a single data frame, create a column with the group name

2. Nest the data frame

3. Give column names from the data to each sub data frame

1. Read data in a single data frame, create a column with the group name

2. Nest the data frame

3. Give column names from the data to each sub data frame

1. Read data in a single data frame, create a column with the group name

2. Nest the data frame

3. Give column names from the data to each sub data frame

Post as a guest

Popular posts from this blog

Bressuire

Vorschmack

Xamarin.iOS Cant Deploy on Iphone

Looking for advice on processing variable length text files

2 Answers 2

1. Read data in a single data frame, create a column with the group name

2. Nest the data frame

3. Give column names from the data to each sub data frame

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

1. Read data in a single data frame, create a column with the group name

2. Nest the data frame

3. Give column names from the data to each sub data frame

1. Read data in a single data frame, create a column with the group name

2. Nest the data frame

3. Give column names from the data to each sub data frame

1. Read data in a single data frame, create a column with the group name

2. Nest the data frame

3. Give column names from the data to each sub data frame

1. Read data in a single data frame, create a column with the group name

2. Nest the data frame

3. Give column names from the data to each sub data frame

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Bressuire

Vorschmack

Xamarin.iOS Cant Deploy on Iphone

2 Answers
2

2 Answers
2

2 Answers
2