How to extract images from uploaded word document in Shiny

I am working on a Shiny app that reads Word documents uploaded by users. The uploaded document then displays a table of all elements in the document and their formatting. I want it to also show any pictures from the uploaded Word doc. Documents containing multiple images aren't an issue - users will only ever upload documents with one image.

To do this, I am using the officer package. It has a function called media_extract where you can do exactly what I want. The issue is, while the documentation says this function can be used to extract images from .doc or .ppt files, I can only get it to work for the latter. This is because media_extract takes the image file path as an argument, but I cannot generate a file path for Word docs. The file path is generated by using one of two officer functions depending on the file type: docx_summary or pptx_summary. These are also the functions I use to generate the tables rendered in my app. The pptx_summary creates a table with a media_path column, which displays a file path for image elements, while docx_summary generates no such column. Absent that column and the path it includes, I don't know how to extract images from Word docs using this function.

For your convenience, here is my code for two Shiny apps: one that reads powerpoints and one for word docs. If you upload a powerpoint file and word file that include an image you will see how the tables generated in each app are different. My powerpoint app also renders an image, to show you how that is done. Obviously that functionality is not in my word app...

Powerpoint reader app:

library(officer)

library(DT)

library(shiny)



ui<- fluidPage(



  titlePanel("Document Scanner"),

  sidebarLayout(

    sidebarPanel(

      fileInput("uploadedfile", "Upload a file", multiple=FALSE,

                accept=c(".ppt", ".pptx", ".docx")) 

    ),

    mainPanel(

      tags$h3(tags$b("Document Summary")),

      br(),

      DT::dataTableOutput("display_table"),

      br(),

      imageOutput("myImage")

    )

  )

)

server<-function(input,output) {

  #creating reactive value for uploaded file

  x<-reactive({

    uploadedfileDF<- input$uploadedfile

    uploadedfileDataPath<- uploadedfileDF$datapath

    read_pptx(uploadedfileDataPath)





  })



  #rendering formatting table

  output$display_table<-DT::renderDataTable({



    req(input$uploadedfile)

    DT::datatable(pptx_summary(x()))

  })





  #rendering images from powerpoint

  output$myImage<-renderImage({



    readFile<-x()

    fileSummaryDF<-pptx_summary(readFile)

#Getting path to image (this is basically straight from the documentation 

#for media_extract)

    fileSummaryDF_filtered<- fileSummaryDF[fileSummaryDF$content_type %in% "image", ]

    media_file <- fileSummaryDF_filtered$media_file

    png_file <- tempfile(fileext = ".png")

    media_extract(readFile, path = media_file, target = png_file)



    list(src = png_file,

         alt="Test Picture")

  })

}

shinyApp(ui, server)

Word reader app:

library(officer)

library(DT)

library(shiny)



ui<- fluidPage(



  titlePanel("Word Doc Scanner"),

  sidebarLayout(

    sidebarPanel(

      fileInput("uploadedfile", "Upload a file", multiple=FALSE,

                accept=c(".doc", ".docx")) 

    ),

    mainPanel(

      tags$h3(tags$b("Document Summary")),

      br(),

      DT::dataTableOutput("display_table"),

      imageOutput("image1")

    )

  )

)

server<-function(input,output) {



  # creating reactive content from uploaded file

  x<-reactive({

    print(input$uploadedfile)

    uploadedfileDF<- input$uploadedfile

    uploadedfileDataPath<- uploadedfileDF$datapath

    docDF<-read_docx(path=uploadedfileDataPath)

    summaryDF<-docx_summary(docDF)

  })



  #rendering formatting table 



  output$display_table<-DT::renderDataTable({



    req(input$uploadedfile)

   DT::datatable(x())

  })



  #how to render image without a image path anywhere in table?

}





shinyApp(ui, server)

If this can't be done in officer then I'm happy to do it a different way. Thank you.

asked Nov 13 '18 at 21:16

IanG

It's really just a ZIP file. Rename it to a tempfile with a .zip and use unzip() to unzip it. Look for a word/media/ subdir and the images are there. The source for the docxtractr package has code for the zip part.

– hrbrmstr
Nov 14 '18 at 2:09

I was indeed able to find the image by going using the word/media/ subdir and thus create the path to it. That was simple enough. I did not need to do any unzipping as you mentioned. Anyway, problem solved. Thank you.

– IanG
Nov 14 '18 at 21:16

a word doc is a zipped archive so something had to do the unzipping.

– hrbrmstr
Nov 14 '18 at 21:17

Then I assume the read_docx function from the officer package must have unzipped it.

– IanG
Nov 15 '18 at 5:00

add a comment |

Powerpoint reader app:

library(officer)

library(DT)

library(shiny)



ui<- fluidPage(



  titlePanel("Document Scanner"),

  sidebarLayout(

    sidebarPanel(

      fileInput("uploadedfile", "Upload a file", multiple=FALSE,

                accept=c(".ppt", ".pptx", ".docx")) 

    ),

    mainPanel(

      tags$h3(tags$b("Document Summary")),

      br(),

      DT::dataTableOutput("display_table"),

      br(),

      imageOutput("myImage")

    )

  )

)

server<-function(input,output) {

  #creating reactive value for uploaded file

  x<-reactive({

    uploadedfileDF<- input$uploadedfile

    uploadedfileDataPath<- uploadedfileDF$datapath

    read_pptx(uploadedfileDataPath)





  })



  #rendering formatting table

  output$display_table<-DT::renderDataTable({



    req(input$uploadedfile)

    DT::datatable(pptx_summary(x()))

  })





  #rendering images from powerpoint

  output$myImage<-renderImage({



    readFile<-x()

    fileSummaryDF<-pptx_summary(readFile)

#Getting path to image (this is basically straight from the documentation 

#for media_extract)

    fileSummaryDF_filtered<- fileSummaryDF[fileSummaryDF$content_type %in% "image", ]

    media_file <- fileSummaryDF_filtered$media_file

    png_file <- tempfile(fileext = ".png")

    media_extract(readFile, path = media_file, target = png_file)



    list(src = png_file,

         alt="Test Picture")

  })

}

shinyApp(ui, server)

Word reader app:

library(officer)

library(DT)

library(shiny)



ui<- fluidPage(



  titlePanel("Word Doc Scanner"),

  sidebarLayout(

    sidebarPanel(

      fileInput("uploadedfile", "Upload a file", multiple=FALSE,

                accept=c(".doc", ".docx")) 

    ),

    mainPanel(

      tags$h3(tags$b("Document Summary")),

      br(),

      DT::dataTableOutput("display_table"),

      imageOutput("image1")

    )

  )

)

server<-function(input,output) {



  # creating reactive content from uploaded file

  x<-reactive({

    print(input$uploadedfile)

    uploadedfileDF<- input$uploadedfile

    uploadedfileDataPath<- uploadedfileDF$datapath

    docDF<-read_docx(path=uploadedfileDataPath)

    summaryDF<-docx_summary(docDF)

  })



  #rendering formatting table 



  output$display_table<-DT::renderDataTable({



    req(input$uploadedfile)

   DT::datatable(x())

  })



  #how to render image without a image path anywhere in table?

}





shinyApp(ui, server)

If this can't be done in officer then I'm happy to do it a different way. Thank you.

asked Nov 13 '18 at 21:16

IanG

It's really just a ZIP file. Rename it to a tempfile with a .zip and use unzip() to unzip it. Look for a word/media/ subdir and the images are there. The source for the docxtractr package has code for the zip part.

– hrbrmstr
Nov 14 '18 at 2:09

I was indeed able to find the image by going using the word/media/ subdir and thus create the path to it. That was simple enough. I did not need to do any unzipping as you mentioned. Anyway, problem solved. Thank you.

– IanG
Nov 14 '18 at 21:16

a word doc is a zipped archive so something had to do the unzipping.

– hrbrmstr
Nov 14 '18 at 21:17

Then I assume the read_docx function from the officer package must have unzipped it.

– IanG
Nov 15 '18 at 5:00

add a comment |

Powerpoint reader app:

library(officer)

library(DT)

library(shiny)



ui<- fluidPage(



  titlePanel("Document Scanner"),

  sidebarLayout(

    sidebarPanel(

      fileInput("uploadedfile", "Upload a file", multiple=FALSE,

                accept=c(".ppt", ".pptx", ".docx")) 

    ),

    mainPanel(

      tags$h3(tags$b("Document Summary")),

      br(),

      DT::dataTableOutput("display_table"),

      br(),

      imageOutput("myImage")

    )

  )

)

server<-function(input,output) {

  #creating reactive value for uploaded file

  x<-reactive({

    uploadedfileDF<- input$uploadedfile

    uploadedfileDataPath<- uploadedfileDF$datapath

    read_pptx(uploadedfileDataPath)





  })



  #rendering formatting table

  output$display_table<-DT::renderDataTable({



    req(input$uploadedfile)

    DT::datatable(pptx_summary(x()))

  })





  #rendering images from powerpoint

  output$myImage<-renderImage({



    readFile<-x()

    fileSummaryDF<-pptx_summary(readFile)

#Getting path to image (this is basically straight from the documentation 

#for media_extract)

    fileSummaryDF_filtered<- fileSummaryDF[fileSummaryDF$content_type %in% "image", ]

    media_file <- fileSummaryDF_filtered$media_file

    png_file <- tempfile(fileext = ".png")

    media_extract(readFile, path = media_file, target = png_file)



    list(src = png_file,

         alt="Test Picture")

  })

}

shinyApp(ui, server)

Word reader app:

library(officer)

library(DT)

library(shiny)



ui<- fluidPage(



  titlePanel("Word Doc Scanner"),

  sidebarLayout(

    sidebarPanel(

      fileInput("uploadedfile", "Upload a file", multiple=FALSE,

                accept=c(".doc", ".docx")) 

    ),

    mainPanel(

      tags$h3(tags$b("Document Summary")),

      br(),

      DT::dataTableOutput("display_table"),

      imageOutput("image1")

    )

  )

)

server<-function(input,output) {



  # creating reactive content from uploaded file

  x<-reactive({

    print(input$uploadedfile)

    uploadedfileDF<- input$uploadedfile

    uploadedfileDataPath<- uploadedfileDF$datapath

    docDF<-read_docx(path=uploadedfileDataPath)

    summaryDF<-docx_summary(docDF)

  })



  #rendering formatting table 



  output$display_table<-DT::renderDataTable({



    req(input$uploadedfile)

   DT::datatable(x())

  })



  #how to render image without a image path anywhere in table?

}





shinyApp(ui, server)

If this can't be done in officer then I'm happy to do it a different way. Thank you.

asked Nov 13 '18 at 21:16

IanG

Powerpoint reader app:

library(officer)

library(DT)

library(shiny)



ui<- fluidPage(



  titlePanel("Document Scanner"),

  sidebarLayout(

    sidebarPanel(

      fileInput("uploadedfile", "Upload a file", multiple=FALSE,

                accept=c(".ppt", ".pptx", ".docx")) 

    ),

    mainPanel(

      tags$h3(tags$b("Document Summary")),

      br(),

      DT::dataTableOutput("display_table"),

      br(),

      imageOutput("myImage")

    )

  )

)

server<-function(input,output) {

  #creating reactive value for uploaded file

  x<-reactive({

    uploadedfileDF<- input$uploadedfile

    uploadedfileDataPath<- uploadedfileDF$datapath

    read_pptx(uploadedfileDataPath)





  })



  #rendering formatting table

  output$display_table<-DT::renderDataTable({



    req(input$uploadedfile)

    DT::datatable(pptx_summary(x()))

  })





  #rendering images from powerpoint

  output$myImage<-renderImage({



    readFile<-x()

    fileSummaryDF<-pptx_summary(readFile)

#Getting path to image (this is basically straight from the documentation 

#for media_extract)

    fileSummaryDF_filtered<- fileSummaryDF[fileSummaryDF$content_type %in% "image", ]

    media_file <- fileSummaryDF_filtered$media_file

    png_file <- tempfile(fileext = ".png")

    media_extract(readFile, path = media_file, target = png_file)



    list(src = png_file,

         alt="Test Picture")

  })

}

shinyApp(ui, server)

Word reader app:

library(officer)

library(DT)

library(shiny)



ui<- fluidPage(



  titlePanel("Word Doc Scanner"),

  sidebarLayout(

    sidebarPanel(

      fileInput("uploadedfile", "Upload a file", multiple=FALSE,

                accept=c(".doc", ".docx")) 

    ),

    mainPanel(

      tags$h3(tags$b("Document Summary")),

      br(),

      DT::dataTableOutput("display_table"),

      imageOutput("image1")

    )

  )

)

server<-function(input,output) {



  # creating reactive content from uploaded file

  x<-reactive({

    print(input$uploadedfile)

    uploadedfileDF<- input$uploadedfile

    uploadedfileDataPath<- uploadedfileDF$datapath

    docDF<-read_docx(path=uploadedfileDataPath)

    summaryDF<-docx_summary(docDF)

  })



  #rendering formatting table 



  output$display_table<-DT::renderDataTable({



    req(input$uploadedfile)

   DT::datatable(x())

  })



  #how to render image without a image path anywhere in table?

}





shinyApp(ui, server)

If this can't be done in officer then I'm happy to do it a different way. Thank you.

r shiny officer

asked Nov 13 '18 at 21:16

IanG

asked Nov 13 '18 at 21:16

IanG

asked Nov 13 '18 at 21:16

IanG

asked Nov 13 '18 at 21:16

IanG

asked Nov 13 '18 at 21:16

IanG

It's really just a ZIP file. Rename it to a tempfile with a .zip and use unzip() to unzip it. Look for a word/media/ subdir and the images are there. The source for the docxtractr package has code for the zip part.

– hrbrmstr
Nov 14 '18 at 2:09

I was indeed able to find the image by going using the word/media/ subdir and thus create the path to it. That was simple enough. I did not need to do any unzipping as you mentioned. Anyway, problem solved. Thank you.

– IanG
Nov 14 '18 at 21:16

a word doc is a zipped archive so something had to do the unzipping.

– hrbrmstr
Nov 14 '18 at 21:17

Then I assume the read_docx function from the officer package must have unzipped it.

– IanG
Nov 15 '18 at 5:00

add a comment |

It's really just a ZIP file. Rename it to a tempfile with a .zip and use unzip() to unzip it. Look for a word/media/ subdir and the images are there. The source for the docxtractr package has code for the zip part.

– hrbrmstr
Nov 14 '18 at 2:09

I was indeed able to find the image by going using the word/media/ subdir and thus create the path to it. That was simple enough. I did not need to do any unzipping as you mentioned. Anyway, problem solved. Thank you.

– IanG
Nov 14 '18 at 21:16

a word doc is a zipped archive so something had to do the unzipping.

– hrbrmstr
Nov 14 '18 at 21:17

Then I assume the read_docx function from the officer package must have unzipped it.

– IanG
Nov 15 '18 at 5:00

It's really just a ZIP file. Rename it to a tempfile with a .zip and use unzip() to unzip it. Look for a word/media/ subdir and the images are there. The source for the docxtractr package has code for the zip part.

– hrbrmstr
Nov 14 '18 at 2:09

I was indeed able to find the image by going using the word/media/ subdir and thus create the path to it. That was simple enough. I did not need to do any unzipping as you mentioned. Anyway, problem solved. Thank you.

– IanG
Nov 14 '18 at 21:16

a word doc is a zipped archive so something had to do the unzipping.

– hrbrmstr
Nov 14 '18 at 21:17

Then I assume the read_docx function from the officer package must have unzipped it.

– IanG
Nov 15 '18 at 5:00

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53289598%2fhow-to-extract-images-from-uploaded-word-document-in-shiny%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

JfLFO3slPK0x8Mn B5 OQyYw mywDrgEZY,sSHZl0QLXT,wNPr6ppXTkieOPQ3,SXD tbyj

搜尋此網誌

Vfrdtyky