Looking for advice on processing variable length text files





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







1















I need to write some code to process a tab delimited text files that has defined segments with variable lengths in R.



The table included in each segment should start 1 line below "Group:" and stop either 2 lines above "~End" if the group is a control or 6 lines above "~End" if if the group is a standard. The lengths of the tables themselves will be variable and can be empty like the segment "SpikedControl".



and example file looks like this:



Group:  Controls    1
Sample Wells Values MeanValue CV Od-bkgd-blank
Anti-Hu Det A11 2.849 2.855 0.282 2.853
A23 2.860
Coat Control A12 0.161 0.160 0.530 0.159
A24 0.160
Diluent Standard 1 A9 0.114 0.113 1.379 0.104
A21 0.112
Diluent Standard 2 A8 0.012 0.013 2.817 0.012
A20 0.013

~End
Group: SpikedControl 1
Sample Wells Concentration Values MeanValue CV ODbkgdblank Conc %Expected

~End
Group: Standards 1
Sample ExpConc Wells OD CV OD ODblank MeanODBlank Result %Recovery
St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range?
B13 2.882 2.875 1153.779 57.689
St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465
B14 2.855 2.847 670.358 100.554
St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982
B15 2.709 2.702 237.852 107.033
St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449
B16 2.258 2.248 77.452 104.560
St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397
B17 1.433 1.424 25.153 101.868
St006 8.230 B6 0.669 2.9 0.658 0.672 7.781 94.536
B18 0.697 0.686 8.240 100.115
St007 2.743 B7 0.357 5.8 0.348 0.334 3.143 114.579
B19 0.329 0.320 2.759 100.579
St008 0.914 B8 0.198 3.7 0.191 0.186 1.029 112.551
B20 0.188 0.181 0.895 97.891
St009 0.305 B9 0.163 7.8 0.154 0.146 0.532 174.477
B21 0.146 0.137 0.296 97.190
St010 0.102 B10 0.130 5.1 0.123 0.119 0.096 94.087
B22 0.121 0.114 Range? Range?
St011 0.034 B11 0.133 4.7 0.126 0.122 0.134 394.778
B23 0.125 0.117 Range? Range?
St012 0.011 B12 0.117 0.7 0.105 0.104 Range? Range?
B24 0.115 0.104 Range? Range?

EC50 = 28.085

AUC = 5565.432

~End


I am not very experienced with processing text files like this, and am looking on some advice on how to approach identifying these segments and reading the tables within.



Thanks!



Edit - Link to example file:



https://www.dropbox.com/s/4m0lmbbequmpd9b/ExampleFile.txt?dl=0



PS: these files are spit out from a spectrophotometer so I don't have any control over the format as the software is pretty antiquated.



Edit 2 - Making some progress:



Read in file and get start and end lines for each segment



inputtext <- readLines("ExampleFile.txt")

starts <- grep("Group:", inputtext)
ends <- grep("~End", inputtext)


which I can follow up with



test2 <- read.table("ExampleFile.txt", header = T, sep = "t", skip = 17, nrows = 24, blank.lines.skip = F)


now I am just trying to figure out how to accurately identify the number of rows to read.



So if the start row is 17, and the end row is 48. Then nrows needs to 24 which is 48 (end row indicated) - 17 (the first rows that are skipped) - 7 (to account for the header line and lines of fluff on the end of the table, which could also be 4 if it is a control table)



Now I just need to figure out how to loop this and properly identify whether the group is a control or standard to subtract the right amount of fluff.










share|improve this question































    1















    I need to write some code to process a tab delimited text files that has defined segments with variable lengths in R.



    The table included in each segment should start 1 line below "Group:" and stop either 2 lines above "~End" if the group is a control or 6 lines above "~End" if if the group is a standard. The lengths of the tables themselves will be variable and can be empty like the segment "SpikedControl".



    and example file looks like this:



    Group:  Controls    1
    Sample Wells Values MeanValue CV Od-bkgd-blank
    Anti-Hu Det A11 2.849 2.855 0.282 2.853
    A23 2.860
    Coat Control A12 0.161 0.160 0.530 0.159
    A24 0.160
    Diluent Standard 1 A9 0.114 0.113 1.379 0.104
    A21 0.112
    Diluent Standard 2 A8 0.012 0.013 2.817 0.012
    A20 0.013

    ~End
    Group: SpikedControl 1
    Sample Wells Concentration Values MeanValue CV ODbkgdblank Conc %Expected

    ~End
    Group: Standards 1
    Sample ExpConc Wells OD CV OD ODblank MeanODBlank Result %Recovery
    St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range?
    B13 2.882 2.875 1153.779 57.689
    St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465
    B14 2.855 2.847 670.358 100.554
    St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982
    B15 2.709 2.702 237.852 107.033
    St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449
    B16 2.258 2.248 77.452 104.560
    St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397
    B17 1.433 1.424 25.153 101.868
    St006 8.230 B6 0.669 2.9 0.658 0.672 7.781 94.536
    B18 0.697 0.686 8.240 100.115
    St007 2.743 B7 0.357 5.8 0.348 0.334 3.143 114.579
    B19 0.329 0.320 2.759 100.579
    St008 0.914 B8 0.198 3.7 0.191 0.186 1.029 112.551
    B20 0.188 0.181 0.895 97.891
    St009 0.305 B9 0.163 7.8 0.154 0.146 0.532 174.477
    B21 0.146 0.137 0.296 97.190
    St010 0.102 B10 0.130 5.1 0.123 0.119 0.096 94.087
    B22 0.121 0.114 Range? Range?
    St011 0.034 B11 0.133 4.7 0.126 0.122 0.134 394.778
    B23 0.125 0.117 Range? Range?
    St012 0.011 B12 0.117 0.7 0.105 0.104 Range? Range?
    B24 0.115 0.104 Range? Range?

    EC50 = 28.085

    AUC = 5565.432

    ~End


    I am not very experienced with processing text files like this, and am looking on some advice on how to approach identifying these segments and reading the tables within.



    Thanks!



    Edit - Link to example file:



    https://www.dropbox.com/s/4m0lmbbequmpd9b/ExampleFile.txt?dl=0



    PS: these files are spit out from a spectrophotometer so I don't have any control over the format as the software is pretty antiquated.



    Edit 2 - Making some progress:



    Read in file and get start and end lines for each segment



    inputtext <- readLines("ExampleFile.txt")

    starts <- grep("Group:", inputtext)
    ends <- grep("~End", inputtext)


    which I can follow up with



    test2 <- read.table("ExampleFile.txt", header = T, sep = "t", skip = 17, nrows = 24, blank.lines.skip = F)


    now I am just trying to figure out how to accurately identify the number of rows to read.



    So if the start row is 17, and the end row is 48. Then nrows needs to 24 which is 48 (end row indicated) - 17 (the first rows that are skipped) - 7 (to account for the header line and lines of fluff on the end of the table, which could also be 4 if it is a control table)



    Now I just need to figure out how to loop this and properly identify whether the group is a control or standard to subtract the right amount of fluff.










    share|improve this question



























      1












      1








      1








      I need to write some code to process a tab delimited text files that has defined segments with variable lengths in R.



      The table included in each segment should start 1 line below "Group:" and stop either 2 lines above "~End" if the group is a control or 6 lines above "~End" if if the group is a standard. The lengths of the tables themselves will be variable and can be empty like the segment "SpikedControl".



      and example file looks like this:



      Group:  Controls    1
      Sample Wells Values MeanValue CV Od-bkgd-blank
      Anti-Hu Det A11 2.849 2.855 0.282 2.853
      A23 2.860
      Coat Control A12 0.161 0.160 0.530 0.159
      A24 0.160
      Diluent Standard 1 A9 0.114 0.113 1.379 0.104
      A21 0.112
      Diluent Standard 2 A8 0.012 0.013 2.817 0.012
      A20 0.013

      ~End
      Group: SpikedControl 1
      Sample Wells Concentration Values MeanValue CV ODbkgdblank Conc %Expected

      ~End
      Group: Standards 1
      Sample ExpConc Wells OD CV OD ODblank MeanODBlank Result %Recovery
      St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range?
      B13 2.882 2.875 1153.779 57.689
      St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465
      B14 2.855 2.847 670.358 100.554
      St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982
      B15 2.709 2.702 237.852 107.033
      St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449
      B16 2.258 2.248 77.452 104.560
      St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397
      B17 1.433 1.424 25.153 101.868
      St006 8.230 B6 0.669 2.9 0.658 0.672 7.781 94.536
      B18 0.697 0.686 8.240 100.115
      St007 2.743 B7 0.357 5.8 0.348 0.334 3.143 114.579
      B19 0.329 0.320 2.759 100.579
      St008 0.914 B8 0.198 3.7 0.191 0.186 1.029 112.551
      B20 0.188 0.181 0.895 97.891
      St009 0.305 B9 0.163 7.8 0.154 0.146 0.532 174.477
      B21 0.146 0.137 0.296 97.190
      St010 0.102 B10 0.130 5.1 0.123 0.119 0.096 94.087
      B22 0.121 0.114 Range? Range?
      St011 0.034 B11 0.133 4.7 0.126 0.122 0.134 394.778
      B23 0.125 0.117 Range? Range?
      St012 0.011 B12 0.117 0.7 0.105 0.104 Range? Range?
      B24 0.115 0.104 Range? Range?

      EC50 = 28.085

      AUC = 5565.432

      ~End


      I am not very experienced with processing text files like this, and am looking on some advice on how to approach identifying these segments and reading the tables within.



      Thanks!



      Edit - Link to example file:



      https://www.dropbox.com/s/4m0lmbbequmpd9b/ExampleFile.txt?dl=0



      PS: these files are spit out from a spectrophotometer so I don't have any control over the format as the software is pretty antiquated.



      Edit 2 - Making some progress:



      Read in file and get start and end lines for each segment



      inputtext <- readLines("ExampleFile.txt")

      starts <- grep("Group:", inputtext)
      ends <- grep("~End", inputtext)


      which I can follow up with



      test2 <- read.table("ExampleFile.txt", header = T, sep = "t", skip = 17, nrows = 24, blank.lines.skip = F)


      now I am just trying to figure out how to accurately identify the number of rows to read.



      So if the start row is 17, and the end row is 48. Then nrows needs to 24 which is 48 (end row indicated) - 17 (the first rows that are skipped) - 7 (to account for the header line and lines of fluff on the end of the table, which could also be 4 if it is a control table)



      Now I just need to figure out how to loop this and properly identify whether the group is a control or standard to subtract the right amount of fluff.










      share|improve this question
















      I need to write some code to process a tab delimited text files that has defined segments with variable lengths in R.



      The table included in each segment should start 1 line below "Group:" and stop either 2 lines above "~End" if the group is a control or 6 lines above "~End" if if the group is a standard. The lengths of the tables themselves will be variable and can be empty like the segment "SpikedControl".



      and example file looks like this:



      Group:  Controls    1
      Sample Wells Values MeanValue CV Od-bkgd-blank
      Anti-Hu Det A11 2.849 2.855 0.282 2.853
      A23 2.860
      Coat Control A12 0.161 0.160 0.530 0.159
      A24 0.160
      Diluent Standard 1 A9 0.114 0.113 1.379 0.104
      A21 0.112
      Diluent Standard 2 A8 0.012 0.013 2.817 0.012
      A20 0.013

      ~End
      Group: SpikedControl 1
      Sample Wells Concentration Values MeanValue CV ODbkgdblank Conc %Expected

      ~End
      Group: Standards 1
      Sample ExpConc Wells OD CV OD ODblank MeanODBlank Result %Recovery
      St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range?
      B13 2.882 2.875 1153.779 57.689
      St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465
      B14 2.855 2.847 670.358 100.554
      St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982
      B15 2.709 2.702 237.852 107.033
      St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449
      B16 2.258 2.248 77.452 104.560
      St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397
      B17 1.433 1.424 25.153 101.868
      St006 8.230 B6 0.669 2.9 0.658 0.672 7.781 94.536
      B18 0.697 0.686 8.240 100.115
      St007 2.743 B7 0.357 5.8 0.348 0.334 3.143 114.579
      B19 0.329 0.320 2.759 100.579
      St008 0.914 B8 0.198 3.7 0.191 0.186 1.029 112.551
      B20 0.188 0.181 0.895 97.891
      St009 0.305 B9 0.163 7.8 0.154 0.146 0.532 174.477
      B21 0.146 0.137 0.296 97.190
      St010 0.102 B10 0.130 5.1 0.123 0.119 0.096 94.087
      B22 0.121 0.114 Range? Range?
      St011 0.034 B11 0.133 4.7 0.126 0.122 0.134 394.778
      B23 0.125 0.117 Range? Range?
      St012 0.011 B12 0.117 0.7 0.105 0.104 Range? Range?
      B24 0.115 0.104 Range? Range?

      EC50 = 28.085

      AUC = 5565.432

      ~End


      I am not very experienced with processing text files like this, and am looking on some advice on how to approach identifying these segments and reading the tables within.



      Thanks!



      Edit - Link to example file:



      https://www.dropbox.com/s/4m0lmbbequmpd9b/ExampleFile.txt?dl=0



      PS: these files are spit out from a spectrophotometer so I don't have any control over the format as the software is pretty antiquated.



      Edit 2 - Making some progress:



      Read in file and get start and end lines for each segment



      inputtext <- readLines("ExampleFile.txt")

      starts <- grep("Group:", inputtext)
      ends <- grep("~End", inputtext)


      which I can follow up with



      test2 <- read.table("ExampleFile.txt", header = T, sep = "t", skip = 17, nrows = 24, blank.lines.skip = F)


      now I am just trying to figure out how to accurately identify the number of rows to read.



      So if the start row is 17, and the end row is 48. Then nrows needs to 24 which is 48 (end row indicated) - 17 (the first rows that are skipped) - 7 (to account for the header line and lines of fluff on the end of the table, which could also be 4 if it is a control table)



      Now I just need to figure out how to loop this and properly identify whether the group is a control or standard to subtract the right amount of fluff.







      r






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 16 '18 at 17:46







      RevDev

















      asked Nov 16 '18 at 13:42









      RevDevRevDev

      1319




      1319
























          2 Answers
          2






          active

          oldest

          votes


















          1














          I ended up doing the following:



          library(tidyverse)

          inputtext <- readLines("Test.txt")

          starts <- grep("Group:", inputtext)
          ends <- grep("~End", inputtext)

          realend <- (ends - starts) - 2

          dff <- list()

          for (i in 1:4) {

          dff[[i]] <- read.table("Test.txt",
          header = T,
          sep = "t",
          skip = starts[i],
          nrows = realend[i],
          blank.lines.skip = F,
          row.names = NULL)

          }

          dff <- lapply(dff, function(x) {x[!is.na(x$Values),]})

          dff <- dff[sapply(dff, function(x) dim(x)[1]) > 0]

          names(dff) <- letters[1:length(dff)]

          list2env(dff,.GlobalEnv)





          share|improve this answer































            0














            You can use a nested data frame for that purpose. I detailed the method in 3 steps below.



            library(tidyverse)
            inputtext <- readLines("~/downloads/ExampleFile.txt")


            1. Read data in a single data frame, create a column with the group name



            dtf1 <- data_frame(input = inputtext) %>% 
            separate(input, c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"), sep = "t") %>%
            mutate(group = ifelse(grepl("Group:", x1), x2, NA)) %>%
            fill(group) %>%
            filter(!is.na(x4))
            head(dtf1)


            2. Nest the data frame



            dtf2 <- dtf1 %>% 
            group_by(group) %>%
            nest()
            dtf2$data[[1]]


            Give column names from the data to the first nested data frame



            colnames1 <- dtf2$data[[1]] %>% slice(1) %>% unlist()
            colnames1[is.na(colnames1)] <- names(colnames1[is.na(colnames1)])
            colnames(dtf2$data[[1]]) <- colnames1


            3. Give column names from the data to each sub data frame



                dtf3 <- dtf2 %>% 
            mutate(names = map(data, slice, 1),
            names = map(names, unlist),
            names = map(names,
            function(x){ # Replace NA column names by the default x_ names
            x[is.na(x)] <- names(x[is.na(x)])
            return(x)
            }),
            data = map2(data, names, setNames),
            data = map(data, slice, -1))


            You now have a list of data frames. You can use the group name to call the corresponding data frame:



             > dtf3$data[dtf3$group=="Controls"][[1]]
            # A tibble: 8 x 9
            Sample Wells Values MeanValue CV `Od-bkgd-blank` x7 x8 x9
            <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
            1 Anti-Hu Det A11 2.849 2.855 0.282 2.853 NA NA NA
            2 "" A23 2.860 "" "" "" NA NA NA
            3 Coat Control A12 0.161 0.160 0.530 0.159 NA NA NA
            4 "" A24 0.160 "" "" "" NA NA NA
            5 Diluent Standard 1 A9 0.114 0.113 1.379 0.104 NA NA NA
            6 "" A21 0.112 "" "" "" NA NA NA
            7 Diluent Standard 2 A8 0.012 0.013 2.817 0.012 NA NA NA
            8 "" A20 0.013 "" "" "" NA NA NA
            > dtf3$data[dtf3$group=="Standards"][[1]]
            # A tibble: 24 x 9
            Sample ExpConc Wells OD `CV OD` ODblank MeanODBlank Result `%Recovery`
            <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
            1 St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range?
            2 "" "" B13 2.882 "" 2.875 "" 1153.779 57.689
            3 St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465
            4 "" "" B14 2.855 "" 2.847 "" 670.358 100.554
            5 St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982
            6 "" "" B15 2.709 "" 2.702 "" 237.852 107.033
            7 St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449
            8 "" "" B16 2.258 "" 2.248 "" 77.452 104.560
            9 St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397
            10 "" "" B17 1.433 "" 1.424 "" 25.153 101.868
            # ... with 14 more rows
            >


            Note: rename based on this answer.



            Placed source data as a gist in case it gets lost:
            spectrophotometer.txt






            share|improve this answer


























            • Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.

              – RevDev
              Nov 16 '18 at 14:38












            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53339042%2flooking-for-advice-on-processing-variable-length-text-files%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            I ended up doing the following:



            library(tidyverse)

            inputtext <- readLines("Test.txt")

            starts <- grep("Group:", inputtext)
            ends <- grep("~End", inputtext)

            realend <- (ends - starts) - 2

            dff <- list()

            for (i in 1:4) {

            dff[[i]] <- read.table("Test.txt",
            header = T,
            sep = "t",
            skip = starts[i],
            nrows = realend[i],
            blank.lines.skip = F,
            row.names = NULL)

            }

            dff <- lapply(dff, function(x) {x[!is.na(x$Values),]})

            dff <- dff[sapply(dff, function(x) dim(x)[1]) > 0]

            names(dff) <- letters[1:length(dff)]

            list2env(dff,.GlobalEnv)





            share|improve this answer




























              1














              I ended up doing the following:



              library(tidyverse)

              inputtext <- readLines("Test.txt")

              starts <- grep("Group:", inputtext)
              ends <- grep("~End", inputtext)

              realend <- (ends - starts) - 2

              dff <- list()

              for (i in 1:4) {

              dff[[i]] <- read.table("Test.txt",
              header = T,
              sep = "t",
              skip = starts[i],
              nrows = realend[i],
              blank.lines.skip = F,
              row.names = NULL)

              }

              dff <- lapply(dff, function(x) {x[!is.na(x$Values),]})

              dff <- dff[sapply(dff, function(x) dim(x)[1]) > 0]

              names(dff) <- letters[1:length(dff)]

              list2env(dff,.GlobalEnv)





              share|improve this answer


























                1












                1








                1







                I ended up doing the following:



                library(tidyverse)

                inputtext <- readLines("Test.txt")

                starts <- grep("Group:", inputtext)
                ends <- grep("~End", inputtext)

                realend <- (ends - starts) - 2

                dff <- list()

                for (i in 1:4) {

                dff[[i]] <- read.table("Test.txt",
                header = T,
                sep = "t",
                skip = starts[i],
                nrows = realend[i],
                blank.lines.skip = F,
                row.names = NULL)

                }

                dff <- lapply(dff, function(x) {x[!is.na(x$Values),]})

                dff <- dff[sapply(dff, function(x) dim(x)[1]) > 0]

                names(dff) <- letters[1:length(dff)]

                list2env(dff,.GlobalEnv)





                share|improve this answer













                I ended up doing the following:



                library(tidyverse)

                inputtext <- readLines("Test.txt")

                starts <- grep("Group:", inputtext)
                ends <- grep("~End", inputtext)

                realend <- (ends - starts) - 2

                dff <- list()

                for (i in 1:4) {

                dff[[i]] <- read.table("Test.txt",
                header = T,
                sep = "t",
                skip = starts[i],
                nrows = realend[i],
                blank.lines.skip = F,
                row.names = NULL)

                }

                dff <- lapply(dff, function(x) {x[!is.na(x$Values),]})

                dff <- dff[sapply(dff, function(x) dim(x)[1]) > 0]

                names(dff) <- letters[1:length(dff)]

                list2env(dff,.GlobalEnv)






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 16 '18 at 19:13









                RevDevRevDev

                1319




                1319

























                    0














                    You can use a nested data frame for that purpose. I detailed the method in 3 steps below.



                    library(tidyverse)
                    inputtext <- readLines("~/downloads/ExampleFile.txt")


                    1. Read data in a single data frame, create a column with the group name



                    dtf1 <- data_frame(input = inputtext) %>% 
                    separate(input, c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"), sep = "t") %>%
                    mutate(group = ifelse(grepl("Group:", x1), x2, NA)) %>%
                    fill(group) %>%
                    filter(!is.na(x4))
                    head(dtf1)


                    2. Nest the data frame



                    dtf2 <- dtf1 %>% 
                    group_by(group) %>%
                    nest()
                    dtf2$data[[1]]


                    Give column names from the data to the first nested data frame



                    colnames1 <- dtf2$data[[1]] %>% slice(1) %>% unlist()
                    colnames1[is.na(colnames1)] <- names(colnames1[is.na(colnames1)])
                    colnames(dtf2$data[[1]]) <- colnames1


                    3. Give column names from the data to each sub data frame



                        dtf3 <- dtf2 %>% 
                    mutate(names = map(data, slice, 1),
                    names = map(names, unlist),
                    names = map(names,
                    function(x){ # Replace NA column names by the default x_ names
                    x[is.na(x)] <- names(x[is.na(x)])
                    return(x)
                    }),
                    data = map2(data, names, setNames),
                    data = map(data, slice, -1))


                    You now have a list of data frames. You can use the group name to call the corresponding data frame:



                     > dtf3$data[dtf3$group=="Controls"][[1]]
                    # A tibble: 8 x 9
                    Sample Wells Values MeanValue CV `Od-bkgd-blank` x7 x8 x9
                    <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
                    1 Anti-Hu Det A11 2.849 2.855 0.282 2.853 NA NA NA
                    2 "" A23 2.860 "" "" "" NA NA NA
                    3 Coat Control A12 0.161 0.160 0.530 0.159 NA NA NA
                    4 "" A24 0.160 "" "" "" NA NA NA
                    5 Diluent Standard 1 A9 0.114 0.113 1.379 0.104 NA NA NA
                    6 "" A21 0.112 "" "" "" NA NA NA
                    7 Diluent Standard 2 A8 0.012 0.013 2.817 0.012 NA NA NA
                    8 "" A20 0.013 "" "" "" NA NA NA
                    > dtf3$data[dtf3$group=="Standards"][[1]]
                    # A tibble: 24 x 9
                    Sample ExpConc Wells OD `CV OD` ODblank MeanODBlank Result `%Recovery`
                    <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
                    1 St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range?
                    2 "" "" B13 2.882 "" 2.875 "" 1153.779 57.689
                    3 St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465
                    4 "" "" B14 2.855 "" 2.847 "" 670.358 100.554
                    5 St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982
                    6 "" "" B15 2.709 "" 2.702 "" 237.852 107.033
                    7 St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449
                    8 "" "" B16 2.258 "" 2.248 "" 77.452 104.560
                    9 St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397
                    10 "" "" B17 1.433 "" 1.424 "" 25.153 101.868
                    # ... with 14 more rows
                    >


                    Note: rename based on this answer.



                    Placed source data as a gist in case it gets lost:
                    spectrophotometer.txt






                    share|improve this answer


























                    • Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.

                      – RevDev
                      Nov 16 '18 at 14:38
















                    0














                    You can use a nested data frame for that purpose. I detailed the method in 3 steps below.



                    library(tidyverse)
                    inputtext <- readLines("~/downloads/ExampleFile.txt")


                    1. Read data in a single data frame, create a column with the group name



                    dtf1 <- data_frame(input = inputtext) %>% 
                    separate(input, c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"), sep = "t") %>%
                    mutate(group = ifelse(grepl("Group:", x1), x2, NA)) %>%
                    fill(group) %>%
                    filter(!is.na(x4))
                    head(dtf1)


                    2. Nest the data frame



                    dtf2 <- dtf1 %>% 
                    group_by(group) %>%
                    nest()
                    dtf2$data[[1]]


                    Give column names from the data to the first nested data frame



                    colnames1 <- dtf2$data[[1]] %>% slice(1) %>% unlist()
                    colnames1[is.na(colnames1)] <- names(colnames1[is.na(colnames1)])
                    colnames(dtf2$data[[1]]) <- colnames1


                    3. Give column names from the data to each sub data frame



                        dtf3 <- dtf2 %>% 
                    mutate(names = map(data, slice, 1),
                    names = map(names, unlist),
                    names = map(names,
                    function(x){ # Replace NA column names by the default x_ names
                    x[is.na(x)] <- names(x[is.na(x)])
                    return(x)
                    }),
                    data = map2(data, names, setNames),
                    data = map(data, slice, -1))


                    You now have a list of data frames. You can use the group name to call the corresponding data frame:



                     > dtf3$data[dtf3$group=="Controls"][[1]]
                    # A tibble: 8 x 9
                    Sample Wells Values MeanValue CV `Od-bkgd-blank` x7 x8 x9
                    <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
                    1 Anti-Hu Det A11 2.849 2.855 0.282 2.853 NA NA NA
                    2 "" A23 2.860 "" "" "" NA NA NA
                    3 Coat Control A12 0.161 0.160 0.530 0.159 NA NA NA
                    4 "" A24 0.160 "" "" "" NA NA NA
                    5 Diluent Standard 1 A9 0.114 0.113 1.379 0.104 NA NA NA
                    6 "" A21 0.112 "" "" "" NA NA NA
                    7 Diluent Standard 2 A8 0.012 0.013 2.817 0.012 NA NA NA
                    8 "" A20 0.013 "" "" "" NA NA NA
                    > dtf3$data[dtf3$group=="Standards"][[1]]
                    # A tibble: 24 x 9
                    Sample ExpConc Wells OD `CV OD` ODblank MeanODBlank Result `%Recovery`
                    <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
                    1 St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range?
                    2 "" "" B13 2.882 "" 2.875 "" 1153.779 57.689
                    3 St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465
                    4 "" "" B14 2.855 "" 2.847 "" 670.358 100.554
                    5 St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982
                    6 "" "" B15 2.709 "" 2.702 "" 237.852 107.033
                    7 St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449
                    8 "" "" B16 2.258 "" 2.248 "" 77.452 104.560
                    9 St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397
                    10 "" "" B17 1.433 "" 1.424 "" 25.153 101.868
                    # ... with 14 more rows
                    >


                    Note: rename based on this answer.



                    Placed source data as a gist in case it gets lost:
                    spectrophotometer.txt






                    share|improve this answer


























                    • Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.

                      – RevDev
                      Nov 16 '18 at 14:38














                    0












                    0








                    0







                    You can use a nested data frame for that purpose. I detailed the method in 3 steps below.



                    library(tidyverse)
                    inputtext <- readLines("~/downloads/ExampleFile.txt")


                    1. Read data in a single data frame, create a column with the group name



                    dtf1 <- data_frame(input = inputtext) %>% 
                    separate(input, c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"), sep = "t") %>%
                    mutate(group = ifelse(grepl("Group:", x1), x2, NA)) %>%
                    fill(group) %>%
                    filter(!is.na(x4))
                    head(dtf1)


                    2. Nest the data frame



                    dtf2 <- dtf1 %>% 
                    group_by(group) %>%
                    nest()
                    dtf2$data[[1]]


                    Give column names from the data to the first nested data frame



                    colnames1 <- dtf2$data[[1]] %>% slice(1) %>% unlist()
                    colnames1[is.na(colnames1)] <- names(colnames1[is.na(colnames1)])
                    colnames(dtf2$data[[1]]) <- colnames1


                    3. Give column names from the data to each sub data frame



                        dtf3 <- dtf2 %>% 
                    mutate(names = map(data, slice, 1),
                    names = map(names, unlist),
                    names = map(names,
                    function(x){ # Replace NA column names by the default x_ names
                    x[is.na(x)] <- names(x[is.na(x)])
                    return(x)
                    }),
                    data = map2(data, names, setNames),
                    data = map(data, slice, -1))


                    You now have a list of data frames. You can use the group name to call the corresponding data frame:



                     > dtf3$data[dtf3$group=="Controls"][[1]]
                    # A tibble: 8 x 9
                    Sample Wells Values MeanValue CV `Od-bkgd-blank` x7 x8 x9
                    <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
                    1 Anti-Hu Det A11 2.849 2.855 0.282 2.853 NA NA NA
                    2 "" A23 2.860 "" "" "" NA NA NA
                    3 Coat Control A12 0.161 0.160 0.530 0.159 NA NA NA
                    4 "" A24 0.160 "" "" "" NA NA NA
                    5 Diluent Standard 1 A9 0.114 0.113 1.379 0.104 NA NA NA
                    6 "" A21 0.112 "" "" "" NA NA NA
                    7 Diluent Standard 2 A8 0.012 0.013 2.817 0.012 NA NA NA
                    8 "" A20 0.013 "" "" "" NA NA NA
                    > dtf3$data[dtf3$group=="Standards"][[1]]
                    # A tibble: 24 x 9
                    Sample ExpConc Wells OD `CV OD` ODblank MeanODBlank Result `%Recovery`
                    <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
                    1 St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range?
                    2 "" "" B13 2.882 "" 2.875 "" 1153.779 57.689
                    3 St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465
                    4 "" "" B14 2.855 "" 2.847 "" 670.358 100.554
                    5 St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982
                    6 "" "" B15 2.709 "" 2.702 "" 237.852 107.033
                    7 St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449
                    8 "" "" B16 2.258 "" 2.248 "" 77.452 104.560
                    9 St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397
                    10 "" "" B17 1.433 "" 1.424 "" 25.153 101.868
                    # ... with 14 more rows
                    >


                    Note: rename based on this answer.



                    Placed source data as a gist in case it gets lost:
                    spectrophotometer.txt






                    share|improve this answer















                    You can use a nested data frame for that purpose. I detailed the method in 3 steps below.



                    library(tidyverse)
                    inputtext <- readLines("~/downloads/ExampleFile.txt")


                    1. Read data in a single data frame, create a column with the group name



                    dtf1 <- data_frame(input = inputtext) %>% 
                    separate(input, c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"), sep = "t") %>%
                    mutate(group = ifelse(grepl("Group:", x1), x2, NA)) %>%
                    fill(group) %>%
                    filter(!is.na(x4))
                    head(dtf1)


                    2. Nest the data frame



                    dtf2 <- dtf1 %>% 
                    group_by(group) %>%
                    nest()
                    dtf2$data[[1]]


                    Give column names from the data to the first nested data frame



                    colnames1 <- dtf2$data[[1]] %>% slice(1) %>% unlist()
                    colnames1[is.na(colnames1)] <- names(colnames1[is.na(colnames1)])
                    colnames(dtf2$data[[1]]) <- colnames1


                    3. Give column names from the data to each sub data frame



                        dtf3 <- dtf2 %>% 
                    mutate(names = map(data, slice, 1),
                    names = map(names, unlist),
                    names = map(names,
                    function(x){ # Replace NA column names by the default x_ names
                    x[is.na(x)] <- names(x[is.na(x)])
                    return(x)
                    }),
                    data = map2(data, names, setNames),
                    data = map(data, slice, -1))


                    You now have a list of data frames. You can use the group name to call the corresponding data frame:



                     > dtf3$data[dtf3$group=="Controls"][[1]]
                    # A tibble: 8 x 9
                    Sample Wells Values MeanValue CV `Od-bkgd-blank` x7 x8 x9
                    <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
                    1 Anti-Hu Det A11 2.849 2.855 0.282 2.853 NA NA NA
                    2 "" A23 2.860 "" "" "" NA NA NA
                    3 Coat Control A12 0.161 0.160 0.530 0.159 NA NA NA
                    4 "" A24 0.160 "" "" "" NA NA NA
                    5 Diluent Standard 1 A9 0.114 0.113 1.379 0.104 NA NA NA
                    6 "" A21 0.112 "" "" "" NA NA NA
                    7 Diluent Standard 2 A8 0.012 0.013 2.817 0.012 NA NA NA
                    8 "" A20 0.013 "" "" "" NA NA NA
                    > dtf3$data[dtf3$group=="Standards"][[1]]
                    # A tibble: 24 x 9
                    Sample ExpConc Wells OD `CV OD` ODblank MeanODBlank Result `%Recovery`
                    <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
                    1 St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range?
                    2 "" "" B13 2.882 "" 2.875 "" 1153.779 57.689
                    3 St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465
                    4 "" "" B14 2.855 "" 2.847 "" 670.358 100.554
                    5 St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982
                    6 "" "" B15 2.709 "" 2.702 "" 237.852 107.033
                    7 St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449
                    8 "" "" B16 2.258 "" 2.248 "" 77.452 104.560
                    9 St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397
                    10 "" "" B17 1.433 "" 1.424 "" 25.153 101.868
                    # ... with 14 more rows
                    >


                    Note: rename based on this answer.



                    Placed source data as a gist in case it gets lost:
                    spectrophotometer.txt







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Nov 20 '18 at 8:41

























                    answered Nov 16 '18 at 14:17









                    Paul RougieuxPaul Rougieux

                    4,35612458




                    4,35612458













                    • Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.

                      – RevDev
                      Nov 16 '18 at 14:38



















                    • Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.

                      – RevDev
                      Nov 16 '18 at 14:38

















                    Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.

                    – RevDev
                    Nov 16 '18 at 14:38





                    Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.

                    – RevDev
                    Nov 16 '18 at 14:38


















                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53339042%2flooking-for-advice-on-processing-variable-length-text-files%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Xamarin.iOS Cant Deploy on Iphone

                    Glorious Revolution

                    Dulmage-Mendelsohn matrix decomposition in Python