How to extract images from uploaded word document in Shiny












0















I am working on a Shiny app that reads Word documents uploaded by users. The uploaded document then displays a table of all elements in the document and their formatting. I want it to also show any pictures from the uploaded Word doc. Documents containing multiple images aren't an issue - users will only ever upload documents with one image.



To do this, I am using the officer package. It has a function called media_extract where you can do exactly what I want. The issue is, while the documentation says this function can be used to extract images from .doc or .ppt files, I can only get it to work for the latter. This is because media_extract takes the image file path as an argument, but I cannot generate a file path for Word docs. The file path is generated by using one of two officer functions depending on the file type: docx_summary or pptx_summary. These are also the functions I use to generate the tables rendered in my app. The pptx_summary creates a table with a media_path column, which displays a file path for image elements, while docx_summary generates no such column. Absent that column and the path it includes, I don't know how to extract images from Word docs using this function.



For your convenience, here is my code for two Shiny apps: one that reads powerpoints and one for word docs. If you upload a powerpoint file and word file that include an image you will see how the tables generated in each app are different. My powerpoint app also renders an image, to show you how that is done. Obviously that functionality is not in my word app...



Powerpoint reader app:



library(officer)
library(DT)
library(shiny)

ui<- fluidPage(

titlePanel("Document Scanner"),
sidebarLayout(
sidebarPanel(
fileInput("uploadedfile", "Upload a file", multiple=FALSE,
accept=c(".ppt", ".pptx", ".docx"))
),
mainPanel(
tags$h3(tags$b("Document Summary")),
br(),
DT::dataTableOutput("display_table"),
br(),
imageOutput("myImage")
)
)
)
server<-function(input,output) {
#creating reactive value for uploaded file
x<-reactive({
uploadedfileDF<- input$uploadedfile
uploadedfileDataPath<- uploadedfileDF$datapath
read_pptx(uploadedfileDataPath)


})

#rendering formatting table
output$display_table<-DT::renderDataTable({

req(input$uploadedfile)
DT::datatable(pptx_summary(x()))
})


#rendering images from powerpoint
output$myImage<-renderImage({

readFile<-x()
fileSummaryDF<-pptx_summary(readFile)
#Getting path to image (this is basically straight from the documentation
#for media_extract)
fileSummaryDF_filtered<- fileSummaryDF[fileSummaryDF$content_type %in% "image", ]
media_file <- fileSummaryDF_filtered$media_file
png_file <- tempfile(fileext = ".png")
media_extract(readFile, path = media_file, target = png_file)

list(src = png_file,
alt="Test Picture")
})
}
shinyApp(ui, server)


Word reader app:



library(officer)
library(DT)
library(shiny)

ui<- fluidPage(

titlePanel("Word Doc Scanner"),
sidebarLayout(
sidebarPanel(
fileInput("uploadedfile", "Upload a file", multiple=FALSE,
accept=c(".doc", ".docx"))
),
mainPanel(
tags$h3(tags$b("Document Summary")),
br(),
DT::dataTableOutput("display_table"),
imageOutput("image1")
)
)
)
server<-function(input,output) {

# creating reactive content from uploaded file
x<-reactive({
print(input$uploadedfile)
uploadedfileDF<- input$uploadedfile
uploadedfileDataPath<- uploadedfileDF$datapath
docDF<-read_docx(path=uploadedfileDataPath)
summaryDF<-docx_summary(docDF)
})

#rendering formatting table

output$display_table<-DT::renderDataTable({

req(input$uploadedfile)
DT::datatable(x())
})

#how to render image without a image path anywhere in table?
}


shinyApp(ui, server)


If this can't be done in officer then I'm happy to do it a different way. Thank you.










share|improve this question























  • It's really just a ZIP file. Rename it to a tempfile with a .zip and use unzip() to unzip it. Look for a word/media/ subdir and the images are there. The source for the docxtractr package has code for the zip part.

    – hrbrmstr
    Nov 14 '18 at 2:09











  • I was indeed able to find the image by going using the word/media/ subdir and thus create the path to it. That was simple enough. I did not need to do any unzipping as you mentioned. Anyway, problem solved. Thank you.

    – IanG
    Nov 14 '18 at 21:16













  • a word doc is a zipped archive so something had to do the unzipping.

    – hrbrmstr
    Nov 14 '18 at 21:17











  • Then I assume the read_docx function from the officer package must have unzipped it.

    – IanG
    Nov 15 '18 at 5:00
















0















I am working on a Shiny app that reads Word documents uploaded by users. The uploaded document then displays a table of all elements in the document and their formatting. I want it to also show any pictures from the uploaded Word doc. Documents containing multiple images aren't an issue - users will only ever upload documents with one image.



To do this, I am using the officer package. It has a function called media_extract where you can do exactly what I want. The issue is, while the documentation says this function can be used to extract images from .doc or .ppt files, I can only get it to work for the latter. This is because media_extract takes the image file path as an argument, but I cannot generate a file path for Word docs. The file path is generated by using one of two officer functions depending on the file type: docx_summary or pptx_summary. These are also the functions I use to generate the tables rendered in my app. The pptx_summary creates a table with a media_path column, which displays a file path for image elements, while docx_summary generates no such column. Absent that column and the path it includes, I don't know how to extract images from Word docs using this function.



For your convenience, here is my code for two Shiny apps: one that reads powerpoints and one for word docs. If you upload a powerpoint file and word file that include an image you will see how the tables generated in each app are different. My powerpoint app also renders an image, to show you how that is done. Obviously that functionality is not in my word app...



Powerpoint reader app:



library(officer)
library(DT)
library(shiny)

ui<- fluidPage(

titlePanel("Document Scanner"),
sidebarLayout(
sidebarPanel(
fileInput("uploadedfile", "Upload a file", multiple=FALSE,
accept=c(".ppt", ".pptx", ".docx"))
),
mainPanel(
tags$h3(tags$b("Document Summary")),
br(),
DT::dataTableOutput("display_table"),
br(),
imageOutput("myImage")
)
)
)
server<-function(input,output) {
#creating reactive value for uploaded file
x<-reactive({
uploadedfileDF<- input$uploadedfile
uploadedfileDataPath<- uploadedfileDF$datapath
read_pptx(uploadedfileDataPath)


})

#rendering formatting table
output$display_table<-DT::renderDataTable({

req(input$uploadedfile)
DT::datatable(pptx_summary(x()))
})


#rendering images from powerpoint
output$myImage<-renderImage({

readFile<-x()
fileSummaryDF<-pptx_summary(readFile)
#Getting path to image (this is basically straight from the documentation
#for media_extract)
fileSummaryDF_filtered<- fileSummaryDF[fileSummaryDF$content_type %in% "image", ]
media_file <- fileSummaryDF_filtered$media_file
png_file <- tempfile(fileext = ".png")
media_extract(readFile, path = media_file, target = png_file)

list(src = png_file,
alt="Test Picture")
})
}
shinyApp(ui, server)


Word reader app:



library(officer)
library(DT)
library(shiny)

ui<- fluidPage(

titlePanel("Word Doc Scanner"),
sidebarLayout(
sidebarPanel(
fileInput("uploadedfile", "Upload a file", multiple=FALSE,
accept=c(".doc", ".docx"))
),
mainPanel(
tags$h3(tags$b("Document Summary")),
br(),
DT::dataTableOutput("display_table"),
imageOutput("image1")
)
)
)
server<-function(input,output) {

# creating reactive content from uploaded file
x<-reactive({
print(input$uploadedfile)
uploadedfileDF<- input$uploadedfile
uploadedfileDataPath<- uploadedfileDF$datapath
docDF<-read_docx(path=uploadedfileDataPath)
summaryDF<-docx_summary(docDF)
})

#rendering formatting table

output$display_table<-DT::renderDataTable({

req(input$uploadedfile)
DT::datatable(x())
})

#how to render image without a image path anywhere in table?
}


shinyApp(ui, server)


If this can't be done in officer then I'm happy to do it a different way. Thank you.










share|improve this question























  • It's really just a ZIP file. Rename it to a tempfile with a .zip and use unzip() to unzip it. Look for a word/media/ subdir and the images are there. The source for the docxtractr package has code for the zip part.

    – hrbrmstr
    Nov 14 '18 at 2:09











  • I was indeed able to find the image by going using the word/media/ subdir and thus create the path to it. That was simple enough. I did not need to do any unzipping as you mentioned. Anyway, problem solved. Thank you.

    – IanG
    Nov 14 '18 at 21:16













  • a word doc is a zipped archive so something had to do the unzipping.

    – hrbrmstr
    Nov 14 '18 at 21:17











  • Then I assume the read_docx function from the officer package must have unzipped it.

    – IanG
    Nov 15 '18 at 5:00














0












0








0








I am working on a Shiny app that reads Word documents uploaded by users. The uploaded document then displays a table of all elements in the document and their formatting. I want it to also show any pictures from the uploaded Word doc. Documents containing multiple images aren't an issue - users will only ever upload documents with one image.



To do this, I am using the officer package. It has a function called media_extract where you can do exactly what I want. The issue is, while the documentation says this function can be used to extract images from .doc or .ppt files, I can only get it to work for the latter. This is because media_extract takes the image file path as an argument, but I cannot generate a file path for Word docs. The file path is generated by using one of two officer functions depending on the file type: docx_summary or pptx_summary. These are also the functions I use to generate the tables rendered in my app. The pptx_summary creates a table with a media_path column, which displays a file path for image elements, while docx_summary generates no such column. Absent that column and the path it includes, I don't know how to extract images from Word docs using this function.



For your convenience, here is my code for two Shiny apps: one that reads powerpoints and one for word docs. If you upload a powerpoint file and word file that include an image you will see how the tables generated in each app are different. My powerpoint app also renders an image, to show you how that is done. Obviously that functionality is not in my word app...



Powerpoint reader app:



library(officer)
library(DT)
library(shiny)

ui<- fluidPage(

titlePanel("Document Scanner"),
sidebarLayout(
sidebarPanel(
fileInput("uploadedfile", "Upload a file", multiple=FALSE,
accept=c(".ppt", ".pptx", ".docx"))
),
mainPanel(
tags$h3(tags$b("Document Summary")),
br(),
DT::dataTableOutput("display_table"),
br(),
imageOutput("myImage")
)
)
)
server<-function(input,output) {
#creating reactive value for uploaded file
x<-reactive({
uploadedfileDF<- input$uploadedfile
uploadedfileDataPath<- uploadedfileDF$datapath
read_pptx(uploadedfileDataPath)


})

#rendering formatting table
output$display_table<-DT::renderDataTable({

req(input$uploadedfile)
DT::datatable(pptx_summary(x()))
})


#rendering images from powerpoint
output$myImage<-renderImage({

readFile<-x()
fileSummaryDF<-pptx_summary(readFile)
#Getting path to image (this is basically straight from the documentation
#for media_extract)
fileSummaryDF_filtered<- fileSummaryDF[fileSummaryDF$content_type %in% "image", ]
media_file <- fileSummaryDF_filtered$media_file
png_file <- tempfile(fileext = ".png")
media_extract(readFile, path = media_file, target = png_file)

list(src = png_file,
alt="Test Picture")
})
}
shinyApp(ui, server)


Word reader app:



library(officer)
library(DT)
library(shiny)

ui<- fluidPage(

titlePanel("Word Doc Scanner"),
sidebarLayout(
sidebarPanel(
fileInput("uploadedfile", "Upload a file", multiple=FALSE,
accept=c(".doc", ".docx"))
),
mainPanel(
tags$h3(tags$b("Document Summary")),
br(),
DT::dataTableOutput("display_table"),
imageOutput("image1")
)
)
)
server<-function(input,output) {

# creating reactive content from uploaded file
x<-reactive({
print(input$uploadedfile)
uploadedfileDF<- input$uploadedfile
uploadedfileDataPath<- uploadedfileDF$datapath
docDF<-read_docx(path=uploadedfileDataPath)
summaryDF<-docx_summary(docDF)
})

#rendering formatting table

output$display_table<-DT::renderDataTable({

req(input$uploadedfile)
DT::datatable(x())
})

#how to render image without a image path anywhere in table?
}


shinyApp(ui, server)


If this can't be done in officer then I'm happy to do it a different way. Thank you.










share|improve this question














I am working on a Shiny app that reads Word documents uploaded by users. The uploaded document then displays a table of all elements in the document and their formatting. I want it to also show any pictures from the uploaded Word doc. Documents containing multiple images aren't an issue - users will only ever upload documents with one image.



To do this, I am using the officer package. It has a function called media_extract where you can do exactly what I want. The issue is, while the documentation says this function can be used to extract images from .doc or .ppt files, I can only get it to work for the latter. This is because media_extract takes the image file path as an argument, but I cannot generate a file path for Word docs. The file path is generated by using one of two officer functions depending on the file type: docx_summary or pptx_summary. These are also the functions I use to generate the tables rendered in my app. The pptx_summary creates a table with a media_path column, which displays a file path for image elements, while docx_summary generates no such column. Absent that column and the path it includes, I don't know how to extract images from Word docs using this function.



For your convenience, here is my code for two Shiny apps: one that reads powerpoints and one for word docs. If you upload a powerpoint file and word file that include an image you will see how the tables generated in each app are different. My powerpoint app also renders an image, to show you how that is done. Obviously that functionality is not in my word app...



Powerpoint reader app:



library(officer)
library(DT)
library(shiny)

ui<- fluidPage(

titlePanel("Document Scanner"),
sidebarLayout(
sidebarPanel(
fileInput("uploadedfile", "Upload a file", multiple=FALSE,
accept=c(".ppt", ".pptx", ".docx"))
),
mainPanel(
tags$h3(tags$b("Document Summary")),
br(),
DT::dataTableOutput("display_table"),
br(),
imageOutput("myImage")
)
)
)
server<-function(input,output) {
#creating reactive value for uploaded file
x<-reactive({
uploadedfileDF<- input$uploadedfile
uploadedfileDataPath<- uploadedfileDF$datapath
read_pptx(uploadedfileDataPath)


})

#rendering formatting table
output$display_table<-DT::renderDataTable({

req(input$uploadedfile)
DT::datatable(pptx_summary(x()))
})


#rendering images from powerpoint
output$myImage<-renderImage({

readFile<-x()
fileSummaryDF<-pptx_summary(readFile)
#Getting path to image (this is basically straight from the documentation
#for media_extract)
fileSummaryDF_filtered<- fileSummaryDF[fileSummaryDF$content_type %in% "image", ]
media_file <- fileSummaryDF_filtered$media_file
png_file <- tempfile(fileext = ".png")
media_extract(readFile, path = media_file, target = png_file)

list(src = png_file,
alt="Test Picture")
})
}
shinyApp(ui, server)


Word reader app:



library(officer)
library(DT)
library(shiny)

ui<- fluidPage(

titlePanel("Word Doc Scanner"),
sidebarLayout(
sidebarPanel(
fileInput("uploadedfile", "Upload a file", multiple=FALSE,
accept=c(".doc", ".docx"))
),
mainPanel(
tags$h3(tags$b("Document Summary")),
br(),
DT::dataTableOutput("display_table"),
imageOutput("image1")
)
)
)
server<-function(input,output) {

# creating reactive content from uploaded file
x<-reactive({
print(input$uploadedfile)
uploadedfileDF<- input$uploadedfile
uploadedfileDataPath<- uploadedfileDF$datapath
docDF<-read_docx(path=uploadedfileDataPath)
summaryDF<-docx_summary(docDF)
})

#rendering formatting table

output$display_table<-DT::renderDataTable({

req(input$uploadedfile)
DT::datatable(x())
})

#how to render image without a image path anywhere in table?
}


shinyApp(ui, server)


If this can't be done in officer then I'm happy to do it a different way. Thank you.







r shiny officer






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 13 '18 at 21:16









IanGIanG

11




11













  • It's really just a ZIP file. Rename it to a tempfile with a .zip and use unzip() to unzip it. Look for a word/media/ subdir and the images are there. The source for the docxtractr package has code for the zip part.

    – hrbrmstr
    Nov 14 '18 at 2:09











  • I was indeed able to find the image by going using the word/media/ subdir and thus create the path to it. That was simple enough. I did not need to do any unzipping as you mentioned. Anyway, problem solved. Thank you.

    – IanG
    Nov 14 '18 at 21:16













  • a word doc is a zipped archive so something had to do the unzipping.

    – hrbrmstr
    Nov 14 '18 at 21:17











  • Then I assume the read_docx function from the officer package must have unzipped it.

    – IanG
    Nov 15 '18 at 5:00



















  • It's really just a ZIP file. Rename it to a tempfile with a .zip and use unzip() to unzip it. Look for a word/media/ subdir and the images are there. The source for the docxtractr package has code for the zip part.

    – hrbrmstr
    Nov 14 '18 at 2:09











  • I was indeed able to find the image by going using the word/media/ subdir and thus create the path to it. That was simple enough. I did not need to do any unzipping as you mentioned. Anyway, problem solved. Thank you.

    – IanG
    Nov 14 '18 at 21:16













  • a word doc is a zipped archive so something had to do the unzipping.

    – hrbrmstr
    Nov 14 '18 at 21:17











  • Then I assume the read_docx function from the officer package must have unzipped it.

    – IanG
    Nov 15 '18 at 5:00

















It's really just a ZIP file. Rename it to a tempfile with a .zip and use unzip() to unzip it. Look for a word/media/ subdir and the images are there. The source for the docxtractr package has code for the zip part.

– hrbrmstr
Nov 14 '18 at 2:09





It's really just a ZIP file. Rename it to a tempfile with a .zip and use unzip() to unzip it. Look for a word/media/ subdir and the images are there. The source for the docxtractr package has code for the zip part.

– hrbrmstr
Nov 14 '18 at 2:09













I was indeed able to find the image by going using the word/media/ subdir and thus create the path to it. That was simple enough. I did not need to do any unzipping as you mentioned. Anyway, problem solved. Thank you.

– IanG
Nov 14 '18 at 21:16







I was indeed able to find the image by going using the word/media/ subdir and thus create the path to it. That was simple enough. I did not need to do any unzipping as you mentioned. Anyway, problem solved. Thank you.

– IanG
Nov 14 '18 at 21:16















a word doc is a zipped archive so something had to do the unzipping.

– hrbrmstr
Nov 14 '18 at 21:17





a word doc is a zipped archive so something had to do the unzipping.

– hrbrmstr
Nov 14 '18 at 21:17













Then I assume the read_docx function from the officer package must have unzipped it.

– IanG
Nov 15 '18 at 5:00





Then I assume the read_docx function from the officer package must have unzipped it.

– IanG
Nov 15 '18 at 5:00












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53289598%2fhow-to-extract-images-from-uploaded-word-document-in-shiny%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53289598%2fhow-to-extract-images-from-uploaded-word-document-in-shiny%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Xamarin.iOS Cant Deploy on Iphone

Glorious Revolution

Dulmage-Mendelsohn matrix decomposition in Python